Advanced analytics for yield improvement and zero defect in semiconductors

Machine learning based advanced analytics for anomaly detection offers powerful techniques that can be used to achieve breakthroughs in yield and field defect rates.

BY ANIL GANDHI, PH. D. and JOY GANDHI, Qualicent Analytics, Inc., Santa Clara, CA

In the last few decades, the volume of data collected in semiconductor manufacturing has grown steadily. Today, with the rapid rise in the number of sensors in the fab, the industry is facing a huge torrent of data that presents major challenges for analysis. Data by itself isn’t useful; for it to be useful it must be converted into actionable information to drive improvements in factory performance and product quality. At the same time, product and process complexities have grown exponentially requiring new ways to analyze huge datasets with thousands of variables to discover patterns that are otherwise undetected by conventional means.

In other industries such as retail, finance, telecom and healthcare where big data analytics is becoming routine, there is widespread evidence of huge dollar savings from application of these techniques. These advanced analytics techniques have evolved through computer science to provide more powerful computing that complements conventional statistics. These techniques are revolutionizing the way we solve process and product problems in the semiconductor supply chain and throughout the product lifecycle. In this paper, we provide an overview of the application of these advanced analytics techniques towards solving yield issues and preventing field failures in semiconductors and electronics.

Advanced data analytics boosts prior methods in achieving breakthrough yields, zero defect and optimizing product and process performance. The techniques can be used as early as product development and all the way through high volume manufacturing. It provides a cost effective observational supplement to expensive DOEs. The techniques include machine learning algorithms that can handle hundreds to thousands of variables in big or small datasets. This capability is indispensable at advanced nodes with complex fab process technologies and product functionalities where defects become intractable.

Modeling target parameters

Machine learning based models provide a predictive model of targets such as yield and field defect rates as functions of process, PCM, sort or final test variables as predictors. In the development phase, the challenge is to eliminate major systematic defect mechanisms and optimize new processes or products to ensure high yields during production ramp. Machine learning algorithms reduce the number of variables from hundreds to thousands to the few key variables of importance; this reduction is just sufficient to allow nonlinear models to be built without over fitting. Using the model, a set of rules involving these key variables are derived. These rules provide the best operating conditions to achieve the target yield or defect rate. FIGURE 1 shows an example non-linear predictive model.

FIGURE 1. Predictive model example.

FIGURE 2 is another example of rules extracted from a model, showing that when all conditions of the rule are valid across the three predictors simultaneously, then this results in lower yield. Discovering this signal with standard regression techniques failed because of the influence of a large number of manufacturing variables. Each of these large number of variables has a small and negligible influence individually, however they all combine to create noise and thus masking the signal. Standard regression techniques, available in commercial software, therefore are unable to detect the signal in these instances and therefore are not of practical use for process control. So how do we discover the rules such as the ones shown in Fig. 2?

FIGURE 2. Individual parameters M, Q and T do not exert influence while collectively they create conditions that destroy yield. Machine learning methods help discover these conditions.

Rules discovery

Conventionally, a parametric hypothesis is made based on prior knowledge (process domain knowledge) and then the hypothesis is tested. For example to improve an etest metric such as threshold voltage one could start with a hypothesis that connects this backend parameter with RF power on an etch process in the frontend. However many times it is impossible to make a hypothesis based on domain knowledge because of the complexity of the processes and the variety of possible interactions, especially across several steps. So alternatively, a generalized model with cross terms is proposed and then significant coefficients are picked and the rest are discarded. This works if the number of variables is small but fails with large number of variables. With 1100 variables (a very conservative number for fabs) there are 221 million possible 3-way interactions, and 60 million 2-way cross terms on top of the linear coefficients!

Fitting these coefficients would require a number of samples or records that are clearly not available in the fab. Recognizing that most of the variables and interactions have no bearing on yield, we must then reduce the feature set size (i.e. number of predictors) within a healthy manageable limit (< 15) before we apply any model to it; several machine learning techniques based on derivatives of decision trees are available for feature reduction. Once the feature set is reduced then exact models are developed using a palette of techniques such as those based on advanced variants of piece-wise regression.

In essence, what we have described above is discovery of the hypothesis, while more traditionally one starts with a hypothesis…to be tested. The example in Fig. 2 had 1100 variables most of which had no influence, six of them have measurable influence (three of them are shown), all of these were hard to detect because of dimensional noise.

The above type of technique is part of a group of methods classified as supervised learning. In this type of machine learning, one defines the predictors and target variables and the technique finds the complex relationships or rules governing how the predictors influence the target. In the next example we include the use of unsupervised learning which allows us to discover clusters that reveal patterns and relationships between predictors which can then be connected to the target variables.

FIGURE 3. Solar manufacturing line conveyor, sampled at four points for colorimetry.

FIGURE 3 shows a solar manufacturing line with four panels moving on a conveyor. The end measure of interest that needed improvement was cell efficiency. Measurements are made at the anneal step for each panel as shown at locations 1, 2, 3, 4 in FIGURE 4. The ratio between measurement sites with respect to a key metric called Colorimetry, was discovered to important; the way this was discovered was by employing clustering algorithms, which are part unsupervised learning. This ratio was found in subsequent supervised model to influence PV solar efficiency as part of a 3-way interaction.

FIGURE 4: The ratios between 1, 2, 3, 4 colorimetry were found to have clusters and the clusters corresponded to date separation.

In this case, without the use of unsupervised machine learning methods, it would have been impossible to identify the ratio between two predictors as an important variable affecting the target because this relationship was not known and therefore no hypothesis could be made for testing it among the large number of metrics and associated statistics that were gathered. Further investigation led to DATE as the determining variable for the clusters.

Ultimately the goal was to create a model for cell efficiency. Feature reduction described earlier is performed followed by advanced piecewise regression and the resulting model based on 10 fold cross validation (build model on 80% of data and test against rest 20% and do this 10 times with a different random sample each time) results in a complex non-linear model with key element that includes a 3 way interaction as shown in FIGURE 5, where the dark green area represents the condition that drops the median efficiency by 30% from best case levels. This condition Colorimetry < 81, Date > X and N2 < 23.5 creates the exclusion zone that should be avoided to improve cell efficiency.

FIGURE 5. N2 (x-axis) X represent the “bad” condition (dark green) where the median cell efficiency drops by 30% from best case levels.

FIGURE 5. N2 (x-axis) < 23.5, colorimetry < 81 and Date > X represent the “bad” condition (dark green) where the median cell efficiency drops by 30% from best case levels.

Advanced anomaly detection for zero defect

Throughout the production phase, process control and maverick part elimination are key to preventing failures in the field at early life and the rest of the device operating life. This is particularly crucial for automotive, medical device and aerospace applications where field failures can result in loss of life or injury and associated liability costs.

The challenge in screening potential field failures is that these are typically marginal parts that pass individual parameter specifications. With increased complexity and hundreds to thousands of variables, monitoring a handful of parameters individually is clearly insufficient. We present a novel machine learning-based approach that uses a composite parameter that includes the key variables of importance.

Conventional single parameter maverick part elimination relies on robust statistics for single parameter distributions. Each parameter control chart detects and eliminates the outliers but may eliminate good parts as well. Single parameter control charts are found to have high false alarm rates resulting in significant scrap rates of good material.

In this novel machine learning based method, the composite parameter uses a distance measure from the centroid in multidimensional space. Just as in single parameter SPC charts, data points that are farthest from the distribution that cross the limits are maverick and are eliminated. In that sense the implementation of this method is very similar to the conventional SPC charts, while the algorithm complexity is hidden from the user.

FIGURE 6. Comparison of single parameter control chart for the top parameter in the model and Composite Distance Control Chart. The composite distance method detected almost all field failures without sacrificing good parts whereas the top parameter alone is grossly insufficient.

See FIGURE 6 for a comparison of the single parameter control chart of the top variable of importance versus the composite distance chart. TABLES 1 and 2 show the confusion matrix for these charts. With the single parameter approach, the topmost contributing parameter is able to detect 1 out of 7 field failures. We call this accuracy. However only one out of 21 declared fails is actually a fail – we call this purity of the fail class. Potentially more failures can be detected by lowering the limit somewhat, in the top chart however in that case the purity of the fail class which was already bad now balloons rapidly to unacceptable levels.

TABLE 1. Top Parameter

TABLE 2. Composite Parameter

In the composite distance method, on the other hand 6 out of 7 fails are detected – good accuracy. The cost of this detection is also low (high purity) because 6 of 10 declared fails are actually field failures – which is a lot better than 1 out of 21 in the incumbent case and significantly better if the limit in the single top parameter chart was lowered even a little.

We emphasize 2 key advantages of this novel anomaly detection technique. First, the multi-variate nature enables detection of marginal parts that not only pass the specification limits for individual parameters but also are within distribution for all of the parameters taken individually. The composite distance successfully identifies marginal parts that fail in the field. Second, this method significantly reduces the false alarm risk compared to single parameter techniques. This leads to reduction of the cost associated with the “producer’s risk” or beta risk of rejecting good units. In short, better detection of maverick material at lower cost.

Summary and conclusion

Machine learning based advanced analytics for anomaly detection offers powerful techniques that can be used to achieve breakthroughs in yield and field defect rates. These techniques are able to crunch large data sets and hundreds to thousands of variables, overcoming a major limitation with conventional techniques. The two key methods that were explored in this paper key are as follows:

Discovery – This set of techniques provides a predictive model that contains the key variables of importance affecting target metrics such as yield or field defect levels. Rules discovery (a supervised learning technique) among many other methods that we employ, discovers rules that provide the best operating or process conditions to achieve the targets. Or alternatively it identifies exclusion zones that should be avoided to prevent loss of yield and performance. Discovery techniques can be used during early production phase when there is greatest need to eliminate major yield or defect mechanisms to protect the high volume ramp. And of course the techniques are equally applicable in high volume production.

Anomaly Detection – This method based on the unsupervised learning class of techniques, is an effective tool for maverick part elimination. The composite distance process control based on Quali- cent’s proprietary distance analysis method provides a cost effective way for preventing field failures. At leading semiconductor and electronics manufacturers, the method has predicted actual automotive field failures that occurred in top carmakers.

POST A COMMENT