How to recover from a cleanroom disaster
Viable options to consider after a catastrophic cleanroom contamination incident.
By Bruce Swales, Relectronic-Remech
While the occurrence of a catastrophic contamination incident in a cleanroom may be considered to be slight, these do occur. Without doubt, incidents are occurring on a seemingly regular basis, which have a considerable adverse impact on cleanroom operations.
Much has been written about cleanroom contamination at a micro-level (e.g., “operator induced” contamination — the need to follow “housekeeping” procedures appropriate for the class of cleanroom), but what about at the macro-level? If a catastrophic contamination event occurs within a cleanroom, e.g., as may be caused by a fire or water/chemical ingress, how does one approach such a disaster? Does all equipment need to be replaced? Does the cleanroom infrastructure require replacement? What other acceptable and viable reinstatement options exist?
Consider the following cases, which have occurred, and are documented.
A fire started within a parts-cleaning station located adjacent to a 6,000-square-meter “ballroom”-style wafer fab cleanroom. Soot and ionic contamination from the fire spread throughout the entire cleanroom, contaminating over 100 new tools and ancillary equipment, walls, and floors.
A fire within the sub-fab area of a wafer fab spread into the ballroom cleanroom, two floors above. The fire caused (virtual) total destruction of approximately one third of the cleanroom and seriously contaminated all equipment within all five floors of the facility.
A fire within a wet etching station within a small wafer fab was constrained by a sprinkler system and did not spread. However, approximately 80 tools within the cleanroom were contaminated by potentially aggressive soot.
Fumes from a leakage of approximately 30 liters of highly corrosive acid contaminated seven tools within a wafer backgrinding and metal deposition cleanroom. Corrosion was evident on many tool metal surfaces within a brief period of hours.
Smoke and ionic contamination from a fire adjacent to an assembly and test cleanroom was ingested by the cleanroom air handling system. The air handling system distributed contaminants over approximately 150 die bonders, wire bonders and specialized ancillary equipment.
A leakage of hydrogen chloride resulted in fumes being ingested by the compressed air plant of a major fab assembly and test facility, distributing corrosive fumes to the pneumatic circuits and componentry of over 100 tools and machines.
This is just a small selection of events that have occurred in recent years within the semiconductor industry. Over the last 20 years, on average, an estimated 125 similar incidents have occurred each year. The major causes have been cited as: fire (47 percent) and fluid leakage (22 percent). Over the two-year period 1995 to 1997, the average individual cost of each incident was approximately US$4 million. At the extreme end, however, are the recent examples of two high-profile wafer fab fires in Taiwan, where the property loss of each was in the vicinity of many hundreds of millions of US dollars. This does not even consider business interruption costs.
The two major categories of contamination at issue in these types of incidents are particulate contamination and ionic contamination.
The effects of particulate contamination are well known and understood by fab operators — such contamination can adversely impact product yield, due to direct contact of particles with wafer products during production. If particulate contamination is severe (as is normally the case after a fire), such contamination may also impact the inherent reliability of production equipment.
Ionic contamination (e.g., from chlorides, fluorides, bromides, etc., following fire) can react with wafer products, damaging layers. Further, ionic contamination can lead to corrosion occurring on metallic surfaces of equipment within the cleanroom. Corrosion on surfaces within a cleanroom cannot be tolerated. Areas of corrosion will efficiently act as “particle generators,” generating continuous particulate contamination. Corrosion on critical tool surfaces will also be likely to reduce tool accuracy, reliability and long-term operational life span.
Options following a contamination disaster in a cleanroom
What are the options available to a cleanroom operator following a major contamination disaster in a cleanroom?
Building and containment structures. Unless physical destruction of the structure has occurred, the structure will normally be able to be decontaminated and reinstated. In cases of significant particulate contamination, HEPA/ULPA filters may have to be replaced.
If metal air ducting, filter/ceiling frames and supporting hangers, etc., have been contaminated by a corrosive contaminant(s), to a degree where corrosion has occurred (or may be likely to occur), replacement of such items is usually the most prudent option.
Production equipment/tools. Generally, in regard to equipment and tools that have been contaminated, several viable reinstatement options exist for the cleanroom operator (see table, above):
Equipment replaced with new equipment.
The vendor (OEM) “repairs” the equipment.
A specialist equipment loss recovery company decontaminates and professionally “recovers” the equipment.
Equipment recovery as a viable option
Recovery of moderate to severely contaminated equipment, by specialist equipment recovery companies, is a viable option, which must be considered by prudent cleanroom operators, particularly when considering the impact of downtime on bottom-line profits. The majority of contaminated/damaged equipment can be professionally recovered, including semiconductor manufacturing equipment.
Precision cleaning techniques and standards have now been developed, which allow semiconductor wafer manufacturing tools (and other types of precision equipment) to be fully decontaminated and recovered to “pre-incident” condition, meaning:
Internationally accepted standards of cleanliness are achieved;
Equipment performs to OEM specification;
Pre-incident levels of yield are achieved;
Pre-incident levels of reliability (uptime) are achieved; and
No reduction in the operational life span of the equipment will be experienced.
Following any contamination incident, it is imperative that the situation is “stabilized” and that affected equipment is preserved from further deterioration.
Due to the ongoing effects of corrosion, time is critical. If preservation measures are not carried out, or are delayed by even a matter of days, many items of equipment will deteriorate to a degree where they may become uneconomical to recover.
The preferred equipment “first aid” preservation strategy encompasses four main activities:
1. Bulk cleaning of building internal surfaces (walls, floors, ceilings, etc.) surrounding the equipment.
2. Bulk cleaning of equipment surfaces.
3. Reduction of relative humidity (RH) levels surrounding the equipment to 40 percent or lower.
4. Applying a preservation coating to those equipment metallic surfaces that are corroding or may rapidly commence to corrode.
Decontamination and recovery
These preservation and cleaning activities in no way constitute a full and effective decontamination process — they are simply procedures that are effective in mitigating ongoing deterioration prior to professional recovery work in accordance with internationally accepted standards.
To recover technical equipment, disassembly of the equipment must be carried out, and each component precision cleaned (utilizing aqueous-based cleaning agents) in order to remove contamination and corrosion. Following precision cleaning, each component must be thoroughly inspected to determine cleanliness and suitability for re-use. If corrosion damage has occurred on metallic surfaces, protective coatings (e.g., paint, electroplating/anodizing, etc.) are often required to be reapplied. Following precision cleaning of components, it is essential that further wipe testing be carried out in order to ensure that levels of residual contamination are below accepted industry standard levels.
The final stages of recovery involve re-assembly, testing and recommissioning. In the case of semiconductor production equipment, performance re-qualification and customer “buy-off” is usually required for new equipment.
Prior to recovered equipment being reinstalled into the cleanroom, it is essential that the entire cleanroom, including all facilities, must be precision cleaned or replaced. The risk of re-contamination of equipment and contamination of product is high, if all traces of contamination within the facility are not removed.
Managing a disaster
Any reasonable action which the cleanroom operator can undertake, in order to reduce the chance of a disaster occurring, should be taken, but do not think that you will necessarily be any more immune to such an event than any other organization. Disasters should be planned for, and can be professionally managed to mitigate the impact of the disaster on the business.
Develop an incident recovery plan (IRP). The plan should identify each type of disaster which could occur, and define how the organization will react to each.
When properly developed and implemented (including ongoing training of all relevant personnel), an incident recovery plan represents a pro-active, designed-in emergency response and management program unique to the organization and covering all “foreseeable” disastrous events.
Adopt the incident recovery team (IRT) approach. There are three organizations upon whom the successful recovery from any semiconductor manufacturing facility incident is dependent — the cleanroom operator, the OEMs of involved equipment, and the qualified technical equipment recovery specialists.
Cleanroom operator: The cleanroom operator is driven by production goals. Every second that even one tool is down has significant impact on the production schedule. Therefore, the cleanroom operator will naturally want to return the tool (and, thereby, the entire facility) to full production as soon as possible. However, the cleanroom operator must rely on information provided by the involved OEMs and professional recovery specialists, in order to make appropriate and timely recovery decisions.
Original equipment manufacturers: A semiconductor cleanroom contains equipment manufactured by a variety of OEMs. The size and scope of the incident will determine how many OEMs are actually involved in the recovery effort. However, regardless of the number of OEMs involved, each OEM should be principally motivated by ensuring customer profitability, best interests and long-term satisfaction. Therefore, simplifying their customer`s recovery from even the most insignificant production disruption should be of critical interest to each OEM. Such assistance can include the availability of technical personnel to support the recovery effort, expedite parts support, and even temporarily replace tools, if appropriate.
Equipment recovery specialists: While OEM personnel fully understand the inner workings of their highly specialized equipment, few, if any, have experience with either the importance or methods of decontamination of such equipment after a major incident. Such expertise can only be provided by companies that specialize in the decontamination of high technology equipment following exposure to fire, water, chemicals, and other environmental contaminants.
The incident recovery plan details how the organization should best respond in the unlikely event that a disaster does occur. The plan is not the “be all and end all” however, and will not stand on its own without effective support and decision making:
Time is of the essence: Contaminated equipment will quickly deteriorate further if no mitigation action occurs, or if this action is unduly delayed.
Manage the recovery process: Appoint, as soon as possible, a single highly motivated and experienced person to be fully responsible for the recovery. This person should be a senior person, must be capable of comprehending the overall situation, and have sufficient authority to make rapid decisions.
Bring in all qualified consultants, advisors, OEMs, etc., (as per the IRP) as soon as possible. Involve all of these parties in the decision-making processes. CR
Bruce Swales is managing director — South East Asia at Relectronic-Remech (Singapore) and has 20 years` experience in the electronics and telecommunications industries. He has gained significant experience in equipment contamination situations within semiconductor front-end and back-end cleanrooms.
Relectronic-Remech is the international leader in the recovery of technical equipment following contamination and damage events. During the past four years, the company has participated in the assessment and recovery of damaged equipment within numerous wafer fab plants in Europe, Asia, and the USA. During 1996-98, the company participated in two audits and recoveries from major semiconductor wafer fabrication plant fires in Taiwan.
Ref erences: “Solid State Technology,” February 1998; “The Incident Recovery Plan: A Key Element in Loss Mitigation,” presented at the 16th International System Safety Conference 1998, Relectronic-Remech (Singapore) Pte Ltd & Robert B. Barnes Assoc., Inc.
Two metal deposition systems undergoing recovery work following contamination.
Case study — Acid spill in a cleanroom
Incident summary: An integrated circuit assembly and test facility located in Asia experienced a spill of approximately 30 liters of type 611 acid in a Class 100 cleanroom. The source of this spill was the failure of a plastic pipe supplying the acid to a wet etching station.
Immediate action: Cleanroom personnel immediately stopped the flow of acid, turned off the air handling system and vented the cleanroom. The spill was then neutralized and all remaining liquid mopped up. They also attempted to wipe down all exposed surfaces to minimize further damage. Once the cleanroom had been sufficiently vented, it was re-sealed and the air handling system restarted, in order to control humidity.
Analysis & preservation: An incident recovery team (IRT) of three decontamination specialists conducted a thorough technical audit (including surface wipe samples to determine the extent and degree of contamination) and to recommend specific preservation actions. They provided a report to cleanroom management outlining specific recovery options.
Type 611 Acid: This is a mixture of six parts Nitric acid, one part Hydrofluoric acid, and one part Acetic acid. The Hydrofluoric acid is at a 40 percent concentration. As a result, 611 acid is categorized as extremely corrosive.
Damage sustained: The acid spill was quickly contained and neutralized by cleanroom personnel. No acid directly contacted any of the cleanroom equipment. However, the spill generated a considerable amount of hazardous vapor, which quickly permeated approximately half of the machines within the cleanroom and the air handling system before the area was sufficiently vented. The result was light to moderate corrosion appearing on most metallic surfaces of six production tools and a measurement system. Total replacement value of this equipment (new) was placed at US$2.5 million.
Recovery constraints: Fab management wanted the tools returned to service as quickly as possible. However, in addition to decontamination and re-certification of the tools, damage, which needed to be repaired, had also occurred to the cleanroom and the related air handling system. An off-site location was authorized for the tool recovery activities.
Recovery actions: IRT personnel took immediate preservation steps to ensure that no further damage to the tools occurred and prepared these for safe shipment to a specialized regional decontamination facility. When equipment arrived at the decontamination facility, work was prioritized, so that all seven items were completely restored within 21 days and returned to the cleanroom for re-installation and re-certification. OEM representatives provided technical oversight to the recovery effort, and assisted in the installation/re-certification of each tool to the customer`s original acceptance criteria. Total cost for the recovery of equipment was less than 25 percent of the total replacement value.
Important lessons: Fab management placed a high priority on minimizing down time of the production line, and this became the basic decision-making criteria. Since the fab had no previous knowledge of the recovery of technical equipment, or experience working with the organization providing the IRT, several days` delay were incurred while the fab undertook the process of understanding technical issues associated with the recovery process and finalizing commercial arrangements. This period of time and the associated business interruption costs could have been significantly reduced if an incident recovery plan, which included the IRT concept, had been implemented. Each original equipment manufacturer (OEM) actively supported the recovery and re-certification effort by providing technical specialists to assist decontamination specialists with the unique technical aspects of respective tools. Quick response by recovery specialists, rapid decision-making by fab management, and immediate support of the OEMs made it possible to return this facility to normal production in less than 45 days from the time of the event. — BS