Contamination Affects the Reliability of Microelectronics
Contamination can contribute to almost all failure modes for integrated circuits. Inadequate contamination control is costly, resulting in yield and reliability losses.
By Douglas W. Cooper, Ph.D.
The J.D. Powers consumer research organization recently reported that in 1994 the most critical determinant of consumer satisfaction with microelectronics is reliability, replacing user support (predominant in 1993), which had replaced ease of use (predominant in 1992), according to CompuServe. Reliability problems are noticeable to the consumer when the product fails, such as when the computer disk crashes and stored data is lost.
Contamination that does not cause yield loss can still cause reliability problems, as discovered by the fabricators of head assemblies for magneto-resistive heads for computer disk memories. Wafers were sliced, the “rows” polished, modified, and cut into “sliders.” Unfortunately, residual chloride from incomplete cleaning lead to substantial loss during accelerated aging (reliability) testing at elevated temperature and humidity. Better methods of cleaning and drying were sought to prevent such losses to customers in the field. No quantification of these losses was made available because manufacturers are as secretive about reliability figures as they are about yield, in most cases.
One quality control handbook1 defines reliability as: “the probability that an item will perform a required function under stated conditions for a stated period of time.”
The important elements are: probability, performance, conditions, and time. Contamination can lessen appreciably the probability that microelectronic elements perform satisfactorily under operating conditions within the required product lifetime. The fraction of items that fail up to final testing before shipping is “yield loss”; the fraction of items that fail after shipping to the customer is “reliability loss.”
Costs associated with reliability include warranty repairs, loss of future business due to loss of reputation, and the costs associated with production factors influencing reliability. In 1979, Crosby2 listed the following as the costs of quality or its absence: “scrap, rework, warranty, service, inspection, engineering changes, purchase order changes, software correction, consumer affairs, audit, quality control, test labor, acceptance equipment, other costs of doing things wrong.” Many of these are related to reliability shortcomings.
Integrated circuits now have millions of elements. For a chip to have adequate reliability, assuming no circuit redundancy, the probability of failure (during the usage period) for any one of the elements must be much less than one in a million. As the number of elements increases, so does the need for improved reliability. The producer controls the production environment, but not the environment of use, so products must have minimal weaknesses to be robust in a variety of customer environments.
Contamination Problems
Contamination is unwanted matter. It is unwanted because it causes problems in a product, either immediately (yield loss if the problem is severe) or after the product has been shipped (reliability loss). These particles may be part of the materials of construction, or created during fabrication, or created in connection with the operation of the device. Deposited particles can cause short circuits or open circuits (See Figure 1) either immediately or after the particles have interacted with solids, liquids or gases near them.
Particles created during manufacture or operation can cause physical problems that lead to disk drive crashes or light blockage through a lithography mask. Magnetic particles loose in a disk drive can cause intermittent read-write errors when they migrate to a read-write head. Particles can be centers of chemical contamination, leading to corrosion or loss of insulation due to ionic contamination and electromigration. Vapors can be introduced through ventilation or through outgassing or desorbing from surfaces or be generated by evaporating liquids. Vapors can cause physical problems such as poor adhesion, leading to immediate or delayed delamination or displacement of the lubricant in disk drives, leading to crashing or “stiction.” Contaminants in liquids, such as chloride ions or sulfate ions, form salts that become centers of corrosion when humidified, or they degrade the electrical properties of materials and interfaces.
Reliability and Contamination
Reliability problems are often associated with contamination.3 Defects that are not severe enough to cause a chip to fail at initial testing, thus not enough to cause yield losses, can become more severe as the chip ages, leading to breakdown and customer frustration and dissatisfaction. Reliability failures will have probabilities that come from the joint distribution of product strengths and environmental stresses. Contamination control is important not only to improve yield but also to improve reliability.
Figure 2 shows the fraction of repeated tests during which a particular electronic unit was still functioning at the indicated time.1 If this were for computer chips with time=0 right after final test, then this would be a reliability curve and the fraction good at time=0 would be the yield. In this figure, all units were operable at time=0. The fraction still good declined with time, first relatively rapidly, as “infant mortality” or “early fails” due to the weak portion of the product reliability distribution, then slowly in a roughly “constant failure rate” period sometimes because of random over-stressing, then rapidly again, as “wear-out” becomes important. For microelectronic chips, early fails are caused by manufacturing mistakes and contamination. Wear-out may occur because of material properties and specifics of the environment in use or because of subtle effects caused in part by contamination.
A simple categorization of failure mechanisms is: one, threshold stress failures, where the item cannot withstand stresses greater than a certain value, the threshold, and there may be a distribution of such thresholds for the items produced, examples being maximum power or maximum voltage an item can withstand.4 Modeling this often involves using the Weibull distribution, an extreme value distribution. Testing each item by applying a specified stress level weeds out the weakest without contributing to later failures of this type. Second are cumulative stress failures, where failure is due to the accumulated stress, no single instance exceeding a threshold, examples being electromigration in a device subjected to high current densities and the failure of solder joints under repetitive heating and cooling cycles. Modeling this often involves using the lognormal distribution. Unfortunately, testing each item by applying a specified stress level weeds out the weakest while contributing to later failures. Third is combined threshold and cumulative stress failure, for example, the cracking of a package (threshold exceeded) followed by corrosion (cumulative).
Note that there has been a series of articles on various types of materials failure mechanisms published by the Institute of Electrical and Electronics Engineers (IEEE), referenced in a recent article.5 The mechanisms studied were: excessive elastic deformation, irreversible plastic deformation, brittle fracture, ductile fracture, elastic buckling, mechanical wear, creep deformation and rupture, cyclic fatigue, interdiffusion, electromigration, migration through a liquid medium, and radiation-induced oxidation of a polymeric material.
DiGiacomo3 lists and models mathematically the following failure mechanisms for loss of reliability in chips, from which the connections to contamination are evident:
1. Corrosion: This leads to open circuits and is the result of a chemical attack on the conductors, often aggravated by moisture or the presence of ionic contaminants or the presence of voltage differences, which can be created (as in a voltaic cell) by the contact of dissimilar materials.
This “galvanic” corrosion can arise even from differences in composition among regions in the same metallic structure or from differences in the electrolyte concentration or composition in contact with the same metal,6 all of which contamination can cause. Hygroscopic contaminants can produce wet conditions at relative humidities much lower than 100 percent RH and dissolved gases in such liquid films can produce electrolytes, facilitating corrosion.
2. Metal migration: This leads to short circuits due to the growth of thin metal structures (dendrites) where they should not grow. Wet migration comes from the presence of water, relatively high current densities, and voltage difference of a few volts or more. Some contaminants retard wet migration. Dry migration is due to metal ions moving through insulator material, like glass, in a voltage gradient, and is favored by higher voltage differences and higher temperatures.
3. Stress corrosion: Metals, glass, plastics under stress are susceptible to the infiltration of foreign molecules that weaken the material.
4. Fatigue: Repetitive stress can lead to breaking of materials or bonds between materials. The stress can be caused by mechanical actions or by changes in temperature interacting with different coefficients of thermal expansion for the materials. Strain, frequency, and temperature are often important variables.
5. Creep: This is the flow of material designed to remain rigid. Creep starts to become important at temperatures above one-half the absolute melting temperature. Contaminants that lower the melting temperature would accelerate this.
6. Electromigration: This is an open-circuit failure mechanism due to the loss of metal at high current densities. Contaminants that affect diffusivity or melting point could affect this.
7. Ionic current leakage: Degradation of material in a humid environment produces ions that increase the conductivity in a region and thus, current-related degradation. Polymeric insulators are susceptible.
8. Dielectric breakdown: Localized electrical field strengths, volts/meter, exceed the insulating capability of the material. Contaminants can aggravate this, as they rarely have the dielectric strength of an insulator such as silicon dioxide.
9. Contact resistance: Oxidation of conductors greatly increases their resistance. Impurities can affect this oxidation, accelerating or decelerating it.
10. Hermeticity: The degree to which the interior is sealed off from the environment will affect the rates of many of these mechanisms.
Almost all of these mechanisms are aggravated or caused by the presence of contamination.
DiGiacomo3 noted that a burn-in period is often used to remove some of the items that fail early, lessening the risk they will reach the customers but shortening the useful life, on average, of the products delivered to customers. Mathematical modeling of burn-in and the most common reliability models are presented by Wood et al.7, who treated the failure mechanisms of dielectric breakdown, electromigration, corrosion, thermo-mechanical void fatigue, and stress-induced void formation. Electromigration of sodium contaminant ions is suggested as a major cause of failures in chips, and this is accelerated by higher temperature and voltage.
Removing the weakest members of the device population through “burn-in” may improve the average failure rate of those remaining, but the economic impacts of burn-in are complicated, as shown in one evaluation of burn-in for integrated circuits,8 which used elevated temperatures to accelerate failures. Burn-in makes more sense as the value of the product increases, the desired reliability level increases, the fraction of reliability losses due to early failures increases, and the costs of testing–including the impact of testing on the non-failing items–decreases.
The effects on reliability of encapsulated electronics due to elevated temperature and humidity were measured and modeled by Bazu and Tazlauanu,9 who proposed an extension of the Arrhenius temperature-dependent rate model, exp (-activation energy/thermal energy), to take into account the failure–accelerating aspects of high humidity and electrical stress. Humidity can be expected to make dry contaminants into corrosive solutions.
Problems of degraded conductivity in pin grid arrays used in microelectronic packaging were found to be caused by contamination and subsequent corrosion, after 25 days in a temperature/humidity test chamber. The contamination and corrosion was caused not only by residual solder flux but also by residuals from the cleaning liquids used–especially chlorides from the wetting agent in the cleaning liquid.10
The reliability of electrical connectors, of the hole, pin or slot-edge types, depends on their cleanliness, with respect to particles and films, as well as the frequency with which the connectors are taken apart and rejoined. Kulwanoski6 described in detail a wide variety of physical and chemical mechanisms, most relating to contamination, that can cause loss of reliability and give more than a dozen references on the connector contact reliability. (See also the book by Uhlig and Revie.)11
Besides the well-known failure modes of electrical “opens” and “shorts,” there can be “soft errors,” where the alpha particles from radioactive decay convert a cell within a chip from “0” to “1” or vice-versa, therefore almost all errors come from low levels of radioactive contamination.12 Unfortunately, like the recent programming problem with the Pentium(TM) chip, such errors can be difficult to detect and impossible to cure without replacing the chips.
Reliability problems may not show up until large numbers of the product have been produced and shipped. The cost of satisfying a customer with a defective product is much greater than the cost of losing the item at final test before it is shipped. The delay and cost factors make reliability problems a serious concern. As shown, contamination can contribute to almost all failure modes for integrated circuits. The cost of inadequate contamination control is not only yield loss, which is generally recognized, but also reliability loss.13
A complete list of the references cited in this article is available from CleanRooms magazine, (603) 891-9230. n
Dr. Douglas Cooper received his doctorate in applied physics from Harvard University in 1974. He conducted environmental research for a decade, mostly at Harvard, where he became Associate Professor of Environmental Physics, then carried out contamination control research at IBM`s T.J. Watson Research Center for another decade. He is currently Director of Contamination Control at the The Texwipe Company (Upper Saddle River, NJ) where he is involved in research and development relating to advanced cleaning materials. Dr. Cooper is the author of over 100 articles published in peer-reviewed journals and has served on various editorial and advisory boards. He is currently a Technical Editor for the Journal of the Institute of Environmental Sciences.
|
Figure 1. Deposited particles can cause electrical defects, such as opens and shorts: the opens are breaks in the conductive path and shorts are breaks in an insulator.
|
Figure 2. Failure history: Test fractions still functioning vs. time. (Data from Juran and Gryna, 1988).