Known Good Die: A Closer Look
02/01/2005
TEST METHODS AND RELIABILITY SCREENS
BY LARRY GILG
The term “known good die” (KGD) has always generated more heat than light. The original intent of the phrase was to signify that bare die or unpackaged ICs had the same quality and reliability as equivalent packaged devices, and could be shipped directly to customers for assembly in their products. The term came into being as microelectronics companies struggled to make the transition from the mature hybrid microcircuit technologies aimed at the military and aerospace industries to the high-performance multichip modules (MCMs) for high-volume computer and consumer applications in the early 1990s.
MCMs were envisioned as a technology for incorporating large, high-speed die on an interconnecting substrate. There was a flurry of activity to develop high-performance interconnecting substrates that could wring the utmost in performance from the bare, bumped, or TAB die. Advanced packaging came to be viewed as a value-added process that offered the utmost performance with the smallest form factor (size and weight) for critical applications where cost was not the prime consideration. The resulting MCM developments gained a reputation as overpriced solutions good for solving niche problems only.
Strategies for obtaining KGD were based on taking advantage of the existing infrastructure for test and burn-in of traditionally packaged components. The favored methods minimize the amount of hardware, tooling, or insertions that add cost to the bare die products. This approach led to the development of KGD carriers, wafer-level burn-in, and high-performance hot-chuck probing - all focused on being effective for testing and reliability screening of infant mortalities.
Today’s Requirements
Burn-in is the most common screen for detection of infant-mortality-type defects due to manufacturing anomalies. This usually is a 100% screen used in production of leading-edge IC devices to eliminate those that contain random latent defects with a high probability of early failure. As the name “burn-in” implies, thermal stress is applied to the devices for extended periods of time (usually hours), resulting in the acceleration of temperature-related defects. The high-temperature stress is usually accompanied by high-voltage stress, which accelerates different failure mechanisms. Burn-in systems today exercise the device under test by powering up the device and applying logic patterns of 1s and 0s to the I/O. Some systems have the capability of reading back the result of applied patterns, making test during burn-in feasible. This is a feature that is favored by memory manufacturers for long-cycle, pattern-sensitive testing of DRAM.
Alternatives for Bare Die Test and Reliability Screening
Given that reliability is the greatest challenge for suppliers of bare die to meet, what alternatives do die suppliers consider when screening is necessary to achieve reliability targets using die products? Three methods have emerged that manufacturers are using to produce die products with high reliability at a low cost:
- Bare die temporary package;
- Wafer-level burn-in and test (WLBT);
- Statistical post-processing test methods and reliability screens.
Bare Die Temporary Package. The growth of the laptop and handheld computers in the mid-1990s resulted in many IC suppliers evaluating the decision to sell bare die. This stimulated the test socket and burn-in fixture industry to develop a number of techniques for the testing and burn-in of bare die. While the initial costs were high, the lifetime cost of the techniques may be comparable to the cost of the package.
A solution that delivers equivalent probabilities of producing KGD compared to the packaged device is to assemble the bare die into temporary carriers. These carriers serve the purpose of a single chip package to allow the complete final test and burn-in infrastructure currently in place for packaged ICs to be used. Figure 1 shows one implementation of a temporary carrier.
Temporary electrical connection is made to the bond pads, and the device is qualified through test and burn-in processes identical to the packaged part. Quality and reliability levels are achieved comparable to packaged devices. Automatic test equipment (ATE), component handlers, burn-in boards, burn-in ovens, and loaders can be used. Once the die is qualified, electrical connections to the bond pads are released and the die is taken from the carrier. The result is a fully tested, qualified IC device with specifications comparable to those of an equivalent packaged part with nearly the same cost.
WLBT. WLBT is usually understood to incorporate a full wafer contactor, either a probe card or sacrificial layer of metal that is deposited on the wafer and then removed (Figure 2). Input nodes on each chip may then be “toggled” to exercise the devices. The voltage applied may be significantly higher than the data sheet maximum, which provides additional stress to cause devices with certain weaknesses to fail. High voltage, along with higher temperatures, accelerates weak device failures. Such devices are then detected with functional or parametric tests, and eliminated from the customer shipment. WLBT has potential for greatly simplifying the back-end of IC fabrication lines.
The main obstacle to implementing a WLBT process is development of a full-wafer contact technology with the process capability required for manufacturing. Contact process capability is a function of not only the contactor technology performance, but also the burn-in stress requirements for a given product.
Statistical Test Methods and Reliability Screens. While burn-in has been the most widely used reliability screen, its effectiveness has been questioned for high-power logic devices manufactured in today’s high-quality, low-defectivity fabs. It can be an expensive process, provides no benefit for the vast majority of products falling into the normal population, and may actually degrade the useful life of healthy devices. Also, it must be noted, the efficacy of the burn-in process is losing effectiveness. For this reason, IC manufacturers are investing significant resources to reduce or eliminate the need for burn-in. This trend is good news for the KGD industry, since contacting a die to stimulate the devices during burn-in presents greater challenges at the wafer or bare die stages of IC manufacturing than at final test in a package.
As an alternative to burn-in, enhanced testing techniques using statistical data analysis to screen defects are gaining favor in the microelectronics industry - especially for device types with low shipping volumes, part number profusion, and short product lifetimes that make burn-in an untenable option. The advantages of reliability screening at test instead of burn-in are savings in time, fixtures, equipment, and handling. The KGD implications are that screens can readily be performed at the wafer level with standard probes and testers, so every device can be considered fully conditioned in compliance with data sheet specifications and reliability targets for that process, regardless of the final package in which the device is to be shipped. The test measurements of each die are recorded, instead of being binned. Pass or fail criteria are determined based on statistical analysis of the measurements recorded. Outliers to the statistical distribution are suspected of being the early life failing devices. The challenge for testing using statistical methods is to identify the early failing population while preserving yield.
An elevated voltage to the circuit under test can cause weak devices to age rapidly, fail, or to exhibit “outlier” behavior in the statistical distribution for an appropriate parameter. Traditional burn-in is almost always performed at an elevated voltage, which contributes significantly to the acceleration of failures in the burn-in regime. However, the voltage stress is not time related in an appreciable way, which makes it feasible for short duration implementation on expensive, state-of-the-art ATE. Voltage stress acceleration coupled with an efficient method of outlier detection, such as Iddq testing, has been used to reduce the need for burn-in on advanced processes for some time. This screening method is most effective at identifying defects that manifest themselves as short circuits, causing higher than normal leakage. However, since the efficacy of leakage current testing is declining because of the high background currents in deep, submicron ICs, as well as the changing defect pareto, manufacturers cannot rely solely on voltage stress acceleration.
A reliability screen that has gained prominence for deep submicron testing is termed very low voltage sweep, or ultra-low VDD test. The power supply voltage (VDD) is initiated at a level far below that minimally required for device turn-on. The supply voltage is then adjusted upward in an incremental fashion, and tested at each increment to identify the voltage level at which the device powers up and provides correct operation. This voltage level is recorded and compared to the statistical “mean” of the lot upon completion of all testing. Values lying outside the normal curve are termed “outliers,” and can be thought of as potential infant mortalities, even if the device passes all subsequent tests. It is thought that the higher than usual voltage required to turn on the device is a function of a resistive contact. The device turn-on voltage may, once initialized, move into the normal range in subsequent tests. But again, this device exhibits anomalous behavior and should be treated with skepticism. More off-line testing may be required to ascertain the probability of early life failure among the outlying population to avoid excessive yield losses from false positive test analysis.
Statistical Post-processing
IC fabricators have developed test regimens that separate devices into “bins” representing the results of the entire test suite. Most tests determine the correct bin as pass or fail, during or immediately after each device is tested. With outlier detection, certain test limits will not be decided until the entire wafer has been tested. This evaluation process is accomplished by logging the data during the wafer test, without making a decision, then processing the data statistically to establish outlier criteria. At this time, either the raw data may be examined for outliers or combinations of the data, such as the difference between the before and after stress values, may be examined for outliers.
A good example of statistical post-processing comes from the realization that defects tend to cluster on a wafer, or even throughout a wafer lot. Die that are within specification but may have some outlier behavior for certain parameters, and are in the locale of numerous other defective die, have a higher probability of becoming infant mortalities.
Reliability Budget
Based on the notion that, if not additional, at least different test insertions may be needed in a KGD process, it is critical that reliability budgets for a given product be scrutinized. The expectation of a system failure in the field is predicated on the number of components and the reliability of each component. This was not an issue when ICs were simply qualified for either commercial or military use. Today, with the silicon content increasing in all market segments, the complexity of the digital content exploding, and the product life cycles of hyperactive mobile, consumer, and computer markets shrinking, sourcing die with acceptable reliability requires a careful analysis of the market requirement and a thorough understanding of appropriate KGD test methods and reliability screens.
References
For a complete list of references, please contact the author.
LARRY GILG, managing director, may be contacted at the Die Products Consortium, 3908 Ave. G, Austin, TX 78751; (512) 452-0077; e-mail: [email protected].