Improving fab productivity with predictive vacuum maintenance
02/01/2003
overviewWith vacuum systems, conventional wisdom has often been "if it ain't broke, don't fix it" or some degree of "parts are expendable." Increasingly, though, these relatively mundane approaches to vacuum system maintenance in wafer fabs are working their way, unfavorably, to the operational bottom line. Today's new approach, enabled throughInternet monitoring of the vacuum system, is able to predict component failures "just in time."
Traditionally, within wafer fabs, operations managers adopt one of two overarching vacuum equipment maintenance philosophies (dictated somewhat by the specific IC being fabricated and market conditions):
- "run-to-fail" — a reactive approach that replaces vacuum system components after they fail to perform to specification, or
- "replace-in-case" — a preventive approach that replaces vacuum components before they are likely to fail.
Both of these philosophies offer their own distinct advantages and disadvantages (see "The pros and cons of maintenance conventional wisdom"). Of the two, many fabs worldwide have had little choice other than to use replace-in-case, simply because it was the only option that ensured process consistency and reliability.
First stage array of Helix Technology's On-Board Cryopump. |
null
Both run-to-fail and replace-in-case fail, however, to address two key objectives of chip manufacturers:
- to deliver consistently reliable products produced in efficient fabs (minimal unplanned downtime), and
- to produce chips at the lowest possible cost consistent with maintaining optimal quality (replace components only when they need it).
Unfortunately, these two objectives have long been mutually exclusive. Moreover, replace-in-case fabs must inventory every spare part that may at some time be required. The cost of that inventory could potentially be in the hundreds of thousands of dollars, depending on the size of the fab supported.
Figure 1. Replacement rate reduction for predictive-based e-JIT vs. conventional preventive replacement (replace-in-case) strategy. |
null
Predictive maintenance
Today, there is a third alternative. "Predictive maintenance" allows fab operations and maintenance managers to determine analytically when vacuum components central to smooth tool operation will experience degraded performance. This information is used to repair or replace ailing components before production is adversely affected.
Indeed, predictive maintenance lends itself to the current trend where IC customers, and in particular fabless semiconductor manufacturers, are demanding upward visibility into an IC maker's supply chain simply because they cannot tolerate surprises that affect manufacturing operations.
For many years, it has been abundantly clear that for IC manufacturers to produce products both efficiently and reliably, they had to keep vacuum components in service until the moment before they would "break," and that moment had to be recognized in advance so downtime could be planned and effectively managed.
Overall, a predictive maintenance approach has many significant advantages:
- The full useful life of key components is realized, significantly enhancing component ROI;
- Unscheduled downtime is virtually eliminated, enhancing efficiency, increasing availability, and enabling fabs to keep supply chain commitments to end-users;
- Overall productivity is increased because downtime is reduced, which cuts the manufacturing cost of chips, enhancing productivity and helping to speed time-to-market; and
- Inventory costs are cut, since the heavy outlay for "replace-in-case" parts can be trimmed significantly.
Figure 2. Unscheduled downtime prevention enabled through e-JIT-based service program. |
null
Predictive maintenance in action
At Helix Technology, we have made predictive maintenance possible by leveraging GOLDLink Support — the ability to monitor securely a number of vacuum pump performance parameters via the Internet.
Through implementation of highly developed proprietary algorithms to monitor anomalies intelligently, rather than reacting to meaningless events, we enable service offerings allowing predictive-based just-in-time (e-JIT) replacement strategies. In addition, we track trends that are known to be potentially deleterious to process. We have found that in many cases it is a combination of seemingly unrelated performance anomalies that are indicators of real prospective problems.
Under a service agreement using e-JIT, we helped a major wafer fab in Europe realize the benefits of switching from a replace-in-case preventive pump replacement strategy to a JIT replacement strategy; the objective was to halve the cryopump replacement rate.
For the initial six-month evaluation period, we installed our monitoring capability on 120 pumps, later expanding to 133 pumps on 20 Applied Materials Endura PVD systems. During this time, our dedicated customer support engineers provided e-JIT replacement notifications and corrective action recommendations, along with delivery of the replacement component prior to failure.
For example, improvements in cryopump regeneration management can be realized in addressing the frequency, duration,and response time to regeneration-related events. Monitoringand analysis of regeneration parameters, including temperature profile, pressure sequencing, rate of rise, help determine thecorrective steps to achieve predictive maintenance objectives (i.e., maximize useful life, optimize performance, and minimizeunscheduled downtime).
Regeneration improvements are enabled through predictive identification of degrading and suboptimal pump performance allowing a reduction in:
- the number of aborts through optimization of regeneration parameters. leading to a decrease in downtime,
- the time to respond to aborts through prompt notification directly to the fab technician at the time of the incident,
- the time to complete full regeneration through optimization of parameters leading to an increase in availability, and
- the frequency of regeneration through implementation of alternate regeneration sequencing, leading to an increase in availability.
Of the initial installation of 120 pumps, ~20% (23 pumps) had clocked nearly 20,000 hours of service. These 23 pumps were to be tracked as a subset to the installed base, and would be replaced only if the e-JIT proprietary health monitoring algorithms determined that replacement was needed.
During the six-month evaluation on an installed base of 120 pumps, this wafer fab replaced a total of 11 pumps, compared to the 30 planned for replacement under its replace-in-case preventive strategy, for a reduction of 63%, successfully achieving the targeted reduction in replacement rate (Fig. 1).
Four of the 11 replacements were from the subset of 23 pumps with 20,000 hours. In addition, with e-JIT predictive capability, we were able to identify seven failing pumps with <20,000 hours usage. These pumps would not have been replaced under the previous preventive maintenance practices and would have resulted in ~28 hrs of unscheduled downtime plus re-qualification time. Because the failures were predicted in advance, the pumps were replaced during normally scheduled maintenance times, resulting in no unscheduled downtime (Fig. 2).
After a successful six month period, the service program was extended. Data taken at nine months showed the average age of the original group of 23 pumps had increased >25% with 19 of the original pumps still operational (Fig. 3).
Figure 3. Extension of useful pump life enabled through e-JIT-based service program. |
null
The new maintenance wisdom
Rather than run with risks associated with run-to-fail or absorb the high cost of replace-in-case, predictive maintenance enables fabs to run an alternative course that leverages the full available working life of each vacuum component, maximizing system productivity and minimizing expenses. Responsibility for predictive maintenance can easily be outsourced to vacuum system manufacturers that offer such capability, since they are the vacuum experts with the knowledge and expertise to ensure optimal bottom-line productivity.
Is predictive maintenance an ideal solution? Since its efficacy is based upon a broad range of semiconductor fab knowledge and experience, it offers far better results than either of the other alternatives. An ideal solution, though, would monitor not only vacuum systems but also other key operational components. Companies that offer predictive maintenance capabilities believe that this broader level of predictive maintenance is just over the horizon.
Acknowledgments
GOLDLink is a registered trademark of Helix Technology Corp.
Stan Kassela received his BSME from Bucknell University. He is marketing manager for the global customer support organization of Helix Technology Corp., 9 Hampshire St., Mansfield, MA 02048; ph 508/337-5037, fax 508/337-5286, [email protected].
Run-to-fail
Reactive or run-to-fail maintenance allows semiconductor production equipment to be run continuously until the frequency of failures proves to be overwhelming. This philosophy has been commonly adopted for use in low-capacity, low-utilization fabs and is widely used as a cost-cutting alternative.
For the short term, run-to-fail does deliver savings on two distinct levels. First, it enables direct savings, since both scheduled downtime and maintenance labor costs are deferred. Second, it delivers material savings because replacement parts costs also are deferred.
The run-to-fail approach has some deep operational disadvantages, though. Perhaps most significant, it offers the absolute certainty of unscheduled downtime when the equipment inevitably fails without warning, sometimes at the worst possible moment. Second, all product that is in-process when the system fails is exposed to the consequences of system failure; thousands (or tens of thousands) of dollars worth of product can become useless scrap, in seconds.
In addition, unscheduled breakdowns will inevitably create disruptions throughout the supply chain. The parts required to get a system up and running again may not be immediately known, so diagnostics must be performed and parts must be ordered, delivered, and installed before the system can be requalified and put back into service. The duration of downtime will never be known until the problem has been thoroughly diagnosed and the replacement part and repair scheduled.
In the end, unscheduled downtime caused by a run-to-fail philosophy means that at some point a key manufacturing resource will not be available. Under run-to-fail, spare parts are not always in stock, leading to the diagnose-order-deliver-install-requalify sequence. The unavailability of that manufacturing resource is inevitable — and possibly worse, when the failure occurs, its duration is almost never immediately defined.
Replace-in-case
Replace-in-case is often adopted in high-utilization fabs where high productivity and no unscheduled downtime are musts. Fabs using this philosophy eliminate a good portion of the unscheduled downtime-related negatives of run-to-fail, but they also create their own set of entirely different problems.
The obvious advantage of replace-in-case is that it replaces components before they are likely to fail. The chance of a failure is decreased through an increase in the frequency of replacement. Typically used in fabs that produce high-end, high-priced ICs, replace-in-case enables fabs to continue production of valuable products with decreased risk of failure. Performance and productivity are more consistent and reliable.
Replace-in-case has several clear disadvantages, however. First, the full useful life of key vacuum components is never achieved; a certain portion of the useful life of manufacturing components is sacrificed in the name of consistent productivity. Those components still have productive life available when they are taken out of service.
Replace-in-case also does not totally prevent failures. Although scheduled replacements may be based on aggregate component reliability, by definition there will always be some level of inefficiency associated with this practice. It is also crucially important to know that replace-in-case does not detect anomalies in performance that would enable astute users to replace parts just before failure. It simply replaces parts, regardless of whether they need it.
In this scenario, replacement schedules are established and downtime is scheduled. Spares are usually kept readily available and the recipe for swapping parts is a known commodity taking a known time. Performance may be improved, but productivity is inevitably affected by the frequency of replacement. This approach is an expensive way to ensure process consistency and reliability, and to minimize maintenance disruptions.