Modeling the yield of mixed-technology die
09/01/1998
Modeling the yield of mixed-technology die
Dave Horton, New Business Development, Cypress Semiconductor, San Jose, California
The negative binomial model for yield prediction has several shortcomings when applied to devices that contain a mixture of different types of circuitry. This article describes those weaknesses, and proposes a simple methodology for improving the accuracy of the model when used to forecast the yield of such mixed-circuit devices.
As technologies become more advanced, the costs associated with them skyrocket. Companies with wafer fabrication facilities must remain at the forefront of technology, making huge investments to remain competitive with leading-edge products. But they must also manufacture a mix of mature products economically, usually in the same factory and sometimes on the same leading-edge processes.
Forecasting costs becomes crucial. Since bringing an advanced product to market is a lengthy process, a poor cost prediction is only apparent after making virtually the entire investment. In our ROI-centric (return on investment) times, we need to be able to predict with certainty the return for our investors and shareholders.
For mature products running on stable processes and manufactured on a standard assembly and test line, costs can be predicted with a high degree of accuracy. But for advanced-technology products with recently qualified manufacturing flows and no established statistical process control goals, predicting unit manufacturing cost with accuracy becomes challenging.
Die yield prediction is perhaps the most demanding aspect of cost projection. This article describes a method for predicting die yield in a piece-wise manner that takes into consideration a mixture of circuitry, different geometries, and unused die area.
Components of cost
Finished wafer cost reflects most of the fixed and variable expenses of running a fab. The material cost of the silicon wafer is tiny beside the burgeoning capital depreciation of the equipment. Most factories allocate fixed and variable expenses to distribute costs more fairly between mature and leading-edge processes; this keeps the cost of mature wafers down by loading more expenditures onto the wafers that demand the higher-tolerance equipment.
Factory cost loading is significant because capital depreciation usually plays a much greater role than labor costs. You can reassign labor costs when your factory demand goes down, but you cannot so easily reduce depreciation. So a lightly loaded wafer fab is a major headache for wafer cost management.
Another factor affecting the wafer cost is the line yield. Before we can examine the die yield, we have to get wafers through the production line unscathed. Manufacturing costs are inversely proportional to yield. Most factories examine process control criteria at various stages of the production process, and reject wafers that fail to meet the criteria. Thus, the line yield can be stated as the ratio of unprocessed wafers in to processed wafers out, with the final acceptance criteria being the parametric measurements on the test structures located strategically around the wafer (sometimes referred to as Etest).
That leads to an efficiency factor in the cost equation associated with the space allocated to test structures. The test structures are folded into the scribe lanes between die wherever possible, and therefore have little or no yield impact. In many cases, though, it is necessary to devote some number of die sites to test structures. That means a lower potential number of candidate die/wafer, which obviously has a far greater impact as the die size goes up.
Given that wafers are round and die are not, there are inefficiencies associated with edge effects. These efficiencies are usually predicted by the standard gross die/wafer calculations that companies develop based upon their specific test structures, scribe lanes, and edge geometries.
After probing the die and mapping the good die, the wafer may be stored in a wafer bank before reaching the backend assembly process. Engineers first backgrind the wafer, then saw the wafer into its constituent die before picking the good die. The good die are then assembled into the target packages, tested, marked, inspected, inventoried, and finally shipped to the customer. Each step has an associated cost and yield, and the aggregate, final yield number may be markedly worse than the basic die yield.
None of the constituent yield factors is as complex, nor as tough to manage and predict, as the die yield, however. As die size goes up and geometries go down, the die yield becomes the dominant factor in the total cost equation.
The negative binomial yield model
The SIA 1997 NTRS Yield Model and Defect Budget program adopts the negative binomial model (Eqn. 1) for the yield model (Fig. 1):
|
Figure 1. The SIA`s yield equation.
Y = (1+AD0/a)-a(1)
where Y represents the defect-limited yield, A is the die area, D0 is the defect density, and a is the cluster factor. The negative binomial distribution is commonly used when the number of successes is fixed and we are interested in the failure probabilities for different numbers of faults/die before reaching the fixed number of successes. Rearranging Eqn. 1 gives an expression for the defect density in terms of the observed yield and measured die area:
D0 = [(Y)-l/a -1] * a/A(2)
The cluster factor represents the bunching or clustering of the failure-causing-defects on a die and is roughly proportional to the complexity of the process. For relatively simple processes with fewer process steps and masks, the cluster factor may be 2. Complex processes with more masks are modeled more accurately using higher cluster factors. A company should pick the cluster factor that most accurately models a particular process, based upon a review of historical device yield data.
The SIA uses a cluster factor of 2. Since higher cluster factors (or large-area clustering/"less clustering" of defects) predict lower yield, it may be prudent for planning purposes to choose a higher cluster factor. For our examples, a cluster factor of 5 is used.
Besides the choice of cluster factor, there are several issues to be addressed before the equation can be applied with accuracy across a range of products.
Potential inaccuracies with the model
Here are six potential problems that need to be addressed before Eqn. 1 can be used with confidence to predict future die yield.
Process intent vs. circuit reality. Process development aims to maximize the yield of a certain type of circuitry, sometimes called the "process-driver." For example, companies in the business of selling logic devices would use leading-edge logic devices as their process-driver while companies in the DRAM business would use a leading-edge DRAM device. The equations and metrics that each company develops and uses in the prediction of manufacturing cost would all be derived using the company`s process driver. In some cases, it may be of little importance to predict the way yield varies among device types - after all, if you are in the business of making DRAMs, who cares what the yield of a CPU would be?
But in most cases, there will be some intermingling of device types on a base process. A company that makes programmable logic devices (PLDs) may also develop the capability to manufacture logic devices, or array-based products such as flash memories. That means the yield model of Eqn. 1 cannot be used accurately across the company`s range of products without modification.
Circuit geometry vs. process geometry. A company that has successfully developed a market for a device may choose to delay the process migration until market conditions demand the investment required to shrink or redesign. That means either maintaining the older process for the mature device or, if the recipe allows it, fabricating the device using the newer process. In the latter case, the yield of the device would be markedly different (hopefully, better) from that of an identically sized die whose geometry matched the capability of the process.
Also, simple optical shrinks rarely provide the smallest dimensions for which the process was designed. Therefore the yield of a shrunken device is usually better than predicted by using Eqn. 1.
Best possible yield. Assume for the moment that the fab is running perfectly; the yield is defect limited; and the factory is using a significant number of different die sizes in a single process. If you were to plot the graph of yield against die size for this perfect factory, you would expect to see the curve hit 100% yield as the die area approaches zero, as predicted by Eqn. 1. However, empirical observations do not support that prediction (although the engineering pragmatists among us would probably be satisfied with the observed 95-97%).
Unused die area. Perhaps more significant than feature size and circuitry, empty space has a profound impact on yield. A die composed of mainly empty space would yield better than a die of identical size that was packed with leading edge circuitry.
Defect density changes over time. The defect density in a relatively new process will improve as the process goes through the normal yield enhancement steps. To predict the die yield at a point in the future, the defect density learning curve must be applied to the process` defect specification.
Mixture of types of circuitry on the same die. Now let us consider producing a device that contains a mixture of different types of circuitry. Many merchant silicon vendors offer "systems on a chip" capability, whereby logic devices such as CPUs can be integrated with memory, IO, and other types of circuitry. The yield of such a mixed die is certainly of great interest, but Eqn. 1 does not predict it accurately.
Proposals for improving accuracy
Defect density adjustment. Two of the six issues above are really the same problem - "different circuit" and "shrink" both refer to a feature-size difference between the process and the circuitry. To improve the accuracy of the yield prediction, we can use a different defect density for the specific circuit type or shrink, either via empirical observation of yield over time followed by the use of Eqn. 2, or by developing an algorithm. One simple algorithm that could be used is:
|
where D0 is the defect density specified for the process, Fmin is the minimum feature size specified for the process, Fact is the actual feature size of the circuitry, and Dr is the resulting adjusted defect density. The adjusted defect density can then be applied to Eqn. 1.
Best possible yield. Assuming that whatever phenomenon is preventing the yield from reaching 100% applies to all die sizes, then it is prudent to apply a simple factor to the yield across the board. Equation 4 illustrates this:
Yr = YsY(4)
where Yr is the adjusted yield, Ys is the adjustment factor (a constant such as 0.96), and Y is the output of Eqn. 1.
Defect density changes over time. The learning rate for improvement of defect density can be modeled as:
Dt = D0(1-l)n (5)
where D0 is the initial defect density, l is the improvement factor per time period, n is the number of the time period of interest in the future, resulting in a predicted defect density Dt.
Unused die area. Some observers may claim that silicon technology has outpaced packaging technology. Most logic devices require a number of pins that make the device pad-limited inside the package, i.e., the assembly design rules for pad-to-pad spacing demand that the die pad ring size exceed the dimensions of the die core. So it becomes more important to treat the yield implications objectively. To zero in on a methodical treatment, consider a die with an active area of size 4A and a X% yield. What would the yield become if the die are of size 4A, containing active area A, and therefore unused space of 3A (Fig. 2)?
|
Figure 2. Die composed of 1/4 active circuitry, 3/4 unused area.
For the negative binomial yield from Eqn.1, let us consider simple numbers such as Do = 10 defects/in.2, logic block size (A) = 10 kmil2 (1 mil = 1/1000th in.), total die size = 40 kmil2, and a = 5. If the entire die (40 kmil2) is active, the yield will be 68.06%. Since there is unused area of 30 kmil2 and defects appearing in that area are not significant, the probability of defects appearing in a 30 ? 103 mm2 area = 1-74.73% (30 kmil2 yield), or 25.27%. The probability of defects appearing anywhere inside the 40k mil2 die is 1- 68.06% = 31.94%. Therefore, the probability of defects appearing in the significant 10 kmil2 area is 31.94-25.27% = 6.67%. The yield of the 40 kmil2 chip thus becomes 1-6.67% = 93.33%.
Equation 6 is a more formal presentation of the above discussion to adjust the yield of a die based on the amount of unused space:
Yres = Yfull_die + (1 - Yunused)(6)
where Yres is the resulting yield, Yfull_die is obtained from Eqn. 1 using the total die area, and Yunused is obtained from Eqn. 1 using the area of the empty space.
Mixture of types of circuitry on the same die. Let us assume that the target die contains a mixture of circuitry and unused area where each type of circuitry would exhibit a different yield for that portion of the die. The defect density relevant to the individual types of circuitry is either specified directly as a process parameter, or it can be calculated from observed yield using Eqn. 2, or derived from circuit geometry using Eqn. 3, assuming that the die area occupied by each type of circuitry is known.
There are two methods that can be used to help improve the accuracy of yield prediction for devices of this type:
Sum of products
First, determine the adjusted defect density appropriate to the individual circuit blocks using Eqn. 3. Then multiply each individual area by the appropriate defect density for the individual product terms. The sum of the product terms can then be applied to Eqn. 1 to determine the yield of the sum of the active areas:
|
where Y is the yield of the composite die, a is the cluster factor, A1-n are the individual block areas, and D1-n are the individual blocks` defect densities.
Product of yields
Begin by determining the adjusted defect density appropriate to the individual circuit blocks using Eqn. 3. Applying Eqn. 1 then gives the yield of each individual circuit block and multiplying the yields gives the final yield number:
|
Equation 7 is very consistent with the basic function and structure of Eqn. 1, but is slightly more difficult to use. Equation 8 is simpler to use, but is less consistent with Eqn. 1, and is in general a more conservative approach for yield prediction.
Neither method is "correct" because of the empirical nature of yield prediction. It is more appropriate to compare the predicted results of each methodology with the measured yield of a number of devices and see which approach reflects reality more closely.
Accuracy of the product of yields method
Surely taking the product of yields is too simple, or at least mathematically unsound. For example, we may wish to find the yield of a device containing two identical blocks of circuitry of area A and defect density D0. The product of the yields is:
(1 + AD0/a)-a (1 + AD0/a)-a = (1 + AD0/a)-2a(9)
The right side of the equation no longer looks like Eqn. 1, and it does not include a 2A term that is intuitively implied by the junction of two blocks of area A. Can we reassure ourselves that the approach is accurate enough for practical predictions of die yield?
First, use the example from earlier and rearrange it to stack four active blocks side by side, with no empty space this time (Fig. 3). We know that 40 kmil2 of logic has a yield of 68.06%. Therefore four blocks, each of area 10 kmil2, grouped together must also exhibit a yield of 68.06%. Is our piece-wise approach to yield calculation consistent? Well, almost: (90.57%)4 = 67.28%. Given the somewhat empirical nature of the field of yield prediction, getting within 1% is close enough for our purposes.
|
Figure 3. Die composed entirely of active circuitry.
In case that approach is not scientific enough, we can plot the difference between Eqn. 1 and Eqn. 9 for different numbers of blocks. In other words, we can view the difference in yield prediction using Eqn. 1 for a single large block of circuitry, vs. the product of a number of smaller sized blocks that together make up the same size as the large block:
DY = (1 + AD0/a)-a - [1 + AD0/(na)]-na(10)
where DY is the difference in predicted yield between the two terms: the left term is Eqn. 1, the right term is Eqn. 1 adjusted by the addition of n, the number of sub-blocks.
Figures 4 and 5 show Eqn. 10 plotted for a two-block and a 10-block circuit, both on the same scale. Just as Fig. 1, the curves show a range of die area and defect density, and a range of cluster factors from 1 to 10. As the number of constituent blocks increases, so does the difference in yield prediction. Similarly, as the total die size and number of defects (A*D) increase, the difference increases. Conversely, as the cluster factor rises, the difference declines. If the chosen cluster factor for the process in question is =5, then the difference in yield prediction of the two methods will not exceed 4% with up to 10 constituent blocks of circuitry.
|
Figure 4. Equation 9 plotted for two circuit blocks.
|
Figure 5. Equation 9 plotted for 10 circuit blocks.
The difference (DY) is always positive: that means the predicted yield using the piece-wise approach is worse than that predicted for a single block of circuitry by a few percent. Some observers may call this conservative for the purposes of yield prediction.
Putting it all together
Here is the recommended approach to predict the yield of a composite device at some time in the future:
Use Eqn. 5 to predict the basic defect density for the time period and Eqn. 3 to derive the adjusted defect density for each block of circuitry in the device.
Use Eqn. 2 to derive the apparent defect density applicable to the mix of circuitry on the device.
Use Eqn. 1 with the apparent defect density together with Eqn. 6 to adjust the yield for unused die area.
Use either Eqn. 7 or 8 to derive the yield of the composite blocks.
Figure 6 shows an example of the results for a die that has three different types of circuitry, plotted for several different amounts of unused silicon area. The figure also shows what a traditional application of the negative binomial model would predict. The yield difference between the traditional model and the proposed methodology is increasingly significant (>10%) as the amount of unused silicon area on the die grows.
|
Figure 6. Standard negative binomial method compared to proposed composite method.
Conversely, the difference between the two potential methods of Eqns. 7 and 8 is not significant, leading to the conclusion that the user should choose the equation that is easier to implement.
Conclusion
Yield monitoring and improvement is a vital part of every silicon vendor`s job. A side-benefit of that effort is the empirical validation and adjustment of a company`s yield model for leading-edge devices in the leading-edge process. Using the methodology in this article, those efforts can be applied to forecasting the yield of production devices that are not necessarily close to the leading edge. Improving the accuracy of die yield forecasting allows the company to make informed decisions on manufacturing investments, process migrations, and new product introductions.
DAVE HORTON received his BSc in electronic engineering from Nottingham University, England, in 1973. He has worked for several semiconductor companies in addition to his current employer, Cypress Semiconductor. His assignments at Cypress have included managing design, product engineering, test, and production control for Cypress`s VMEbus product line. Cypress Semiconductor, 195 Champion Court, San Jose, CA 95134; ph 408/232-4573, fax 408/943-6848.