Issue

How to make a sensor smarter

02/01/2001

Part Two of Two - Operation:

Robert H. McCafferty, RHM Consulting, Sandy Hook, Connecticut

overview
The first part of this article, "Part One, Technology: How to make a sensor smarter," (October 2000) described some techniques for extracting more information about a wafer fab process by taking a fresh look at the behavior and interaction of metrology data over time. This second part here discusses some ways to use those tools to improve processes and productivity.

Figure 1. The pressure in an etch chamber shows marginal control.Click here to enlarge image

The motivation for a semiconductor manufacturer to use the data processing methods described in the first part of this article are product quality and problem avoidance
esolution. The quality motivation tends to drive first principles comprehension, process characterization, pattern recognition (for process control), and parallel coordinate investment. In essence, all of this will be directed toward deriving an optimized process that is controlled (typically in real time) to produce the desired ends. This can be accomplished if necessary by governing reaction kinetics through manipulation of parameter trajectories, although this is rare compared to endpoint detection and analogous techniques. By whatever means, the underlying objective is to deliver a higher percentage of good, invariantly processed wafers at the output of any process.

Figure 2. Template-driven detection of hardware failure.Click here to enlarge image

Similarly, the problem avoidance and resolution task stimulates use of univariate numerical analysis (for incipient problem detection) and pattern recognition engaged to detect nonconforming parameter behavior and to find known fault signatures. Backing this are more global, multivariate methods of chemometric and parallel coordinate analysis to detect unusual conditions and identify the origin of process level faults, respectively. In both lines of application, however, one is engaging the output of instrumentation to develop and exploit information at a process level rather than generating more or better numbers to archive as process data.

Incipient control problem detection
It should be obvious that the bulk etch pressure trace pictured in Fig. 1 is not a desirable situation. This is particularly true because the reactor that produced it continued to run without being interrupted by its on-board controller, since the feedback control loop involved successfully stabilized the chamber pressure within guardbands.

Hence — whether due to a poorly tuned throttle valve control loop, excessive chamber residue build-up, or some other cause — this is clearly a problem in the making. It could bring a tool down and potentially compromise wafers, but it is not yet a visible problem. To avoid this issue and its resulting mean free path variation (which in turn may generate sidewall slope, etch rate, and uniformity problems), one is well advised to run a real-time standard deviation check on reactor pressure data. By engaging real-time analysis designed to detect degraded control loop performance, manufacturing engineers operating this equipment have some advance warning. (Only fairly exotic techniques — at least in semiconductor equipment — such as adaptive control would fix this automatically.) This allows a savvy crew to find a hole in WIP and take the equipment off-line for "just in time" maintenance, at essentially no cost to factory throughput.

Finding nonconforming trajectories
Mechanistically figuring out when any one of the numerous datastreams available from manufacturing tooling does not follow its typical pattern — i.e., simply does not look right — is often the key to recognizing that a problem has occurred and driving its diagnosis. To accomplish this, all relevant signals — those with a significant bearing on the performance of the equipment or viability of the process it orchestrates — must be identified. Usually this is not difficult, and there are typically more than enough signal candidates to consider, but each signal then must be broken down (typically along lines of process recipe steps) and modeled as a template describing its expected behavior. This is done in terms of a lexicon of primitive shapes — steps, ramps, straights, etc. — including gaps over sections where signal behavior is of no interest, with governing parameters for each primitive shape statistically fit to a sample signal, or ideally an aggregate of sample signals. Once complete, this battery of "conformance" templates is then used to parse signals from tool datastreams, either in real time or post process, with templates for which no match is found, indicating problems and reporting back an error condition. Once detected, recognition templates can be engaged to localize and classify the fault condition, or, for those faults evading automated classification, user-driven debugging tools can be used for analysis of why a signal failed to match its conformance template. Hence, by automated means, a broad spectrum of tool signals can be tested in parallel for adherence to expected norms. Note that Fig. 2 actually represents a failure condition, since only two of the pattern matches found are normal. An equipment problem had in fact been detected.

Click here to enlarge image

Detecting specific fault signatures
Once a fault condition is detected, it must be pinpointed and, if at all possible, classified in some recognizable and standardized sense to hasten resolution and drive "tool learning" against its future existence. This problem is somewhat simpler than overall conformance checking because — once an issue has been discovered and analyzed by engineering means — one must only assemble a template to match that specific failure mode rather than the pattern of an entire signal. Consider the chamber pressure abnormality of Fig. 3a, where a chamber sealing issue or transient outgassing from an unexpected source has created a glitch in what should be a smooth pressure rise curve. Given that spikes of this nature leave no other artifacts around them, they are easily matched as a "glitch," a sharp rise followed by an almost equally sharp drop in the measured signal level. In the lexicon of a commercial pattern recognition system known as Patterns, this amounts to a "trending-up" followed immediately (with no gap or dwell time) by a "trending-down," with governing parameters statistically synthesized at the click of a mouse against the failing sample signal. Such parameters effectively train a template to the specific class of feature being sought, so a "glitch" template matches only glitches and not other rising then falling behavior, with matching (and hence the "pressure-glitch" classification) results as in Fig. 3b.

Figure 3. a) Abnormality in a typically smooth pressure rise curve, and b) fault classification using a "pressure-glitch" template.Click here to enlarge image

Reaction level sensing
To draw conclusions about what is occurring at either the wafer level or the reaction chemistry level within a semiconductor process reactor, one must exploit whatever degree of tool, reaction, and wafer-level sensing is available in combination with human intelligence. Output of the sensors engaged must have direct bearing on the reaction state — as does data from an optical emissions spectrometer (OES) or plasma impedance monitor (PIM) — or if feasible, on the actual wafer state, as in the case of a laser reflectometer. Further, the signal (or signals) selected for use must effectively constitute a "heartbeat" of the desired reaction, faithfully reflecting what is actually transpiring, as it occurs, rather than simply measuring secondary reactions or some other, obliquely related quantity. Once that heartbeat has been discerned, with the laser trace in Fig. 4 being a particularly classic case, one must devise a template to match it. In the laser endpoint case this is easily done, as indicated by Fig. 5, once again by training a simple "trending-up"/"trending-down" template. Algorithm development follows this, since one must analyze real-time signals and automatically draw the same conclusions that a skilled process engineer would draw if, in fact, an engineer could conclude anything (given that most single-wafer processes run far too quickly for human decision capabilities to operate effectively). This can lead to the development of sophisticated algorithms, with the laser trace endpoint algorithm, for example, designed to: (1) recognize film transitions, (2) calculate photoresist and TEOS etch rates, (3) calculate and identify process change, and (4) call out the endpoint [1]. Some of this is shown in Fig. 6.

Figure 4. Laser endpoint detection scenario for etch process control in a film stack.Click here to enlarge image

With other instruments more appropriate for low, open area etching scenarios, such as OES and PIM devices, rate information cannot be determined, but etch uniformity can be inferred by examination of the trace slope at endpoint. In either event, a manufacturer is far better informed in taking such measurements than by accepting the instrument output alone.

Figure 5. Section of a laser trace for etch process control and the corresponding "trending-up"/"trending-down" template.Click here to enlarge image

Parallel coordinate analysis
Mining information from large bodies of summary data, particularly if it is for fault detection and localization purposes, is always easier if human visualization and hence cognizance can be engaged. Finding out "what's wrong with this picture" is a human proclivity, even though it is not effective in all cases. Hence, experimental data for the metals manufacturing process portrayed in Fig. 7 — where the objective was to maximize the parameter (strength) plotted on the left on parallel coordinate axes — proved most surprising when carefully examined. Parameter X1, which was on/off in nature and expected to produce better results when on, did so. Somewhat unexpectedly, parameter X3 (experimental run number), which was anticipated to have no impact, probably did have an influence. This parallel coordinate analysis identified the real likelihood of a previously unknown wear mechanism.

Figure 6. Partial algorithm for laser trace endpoint detection. Note the algorithm variables, template elements ("trending-up" and "trending-down") and branching constructs.Click here to enlarge image

More unsettling is parameter X4. Conventionally run at the low range of what is displayed, it produced no high strength results there and in fact contributed strongly to good results when operated at the high end of its displayed range. Parameters X5 (when sufficient data is selected), X6, X7, and X8 were initially thought to govern the entire outcome of this process and yield optimal results only across a narrow spectrum of settings. Here, though, they turned out to produce viable results virtually across the breadth of their available range. These results illustrate a scenario dominated by factors thought to be outside the system, which was identified only with parallel coordinate analysis.

Figure 7. Parallel coordinate display of metallurgical process data. Note the absence of high strength results at all but the highest levels of variable X4.Click here to enlarge image

Chemometric analysis
Based on results of Sematech's J88 project on fault detection and classification for metal etch data (including induced faults) taken over three distinct time periods, it is clear that chemometric methods are at least part of the solution to the general fault detection and classification problem. Experience from that project, where a broad variety of models and analytical techniques were engaged on test data, strongly suggests that a 50-60% capture rate can be achieved for process faults in arbitrary semiconductor data [2]. Further, the simplest technique tested — principal component analysis on parameter mean values, which is relatively easy to execute — proved very competitive in sensitivity to the most complicated methods evaluated. Hence, under the presumption that models are updated to maintain their sensitivity and robustness, the J88 recipe for chemometric fault detection is:

Reduce sample data for initial model building to mean parameter values of each observation.
Mean center and scale the resulting dataset.
Derive principal components from the scaled, centered data set. (Note that there will be as many principal components as parameters in the original data.)
Keep the minimum number of principal components necessary to capture an acceptable amount of variation apparent in the original dataset.
Build a model, ideally graphically (and potentially even using parallel coordinates), of the relationship between retained principal components.
Compare new data to the existing model, declaring a fault condition when something is obviously wrong, either by visual examination or when Q and T2 statistics display excessive magnitude.
Regenerate a comparison model either periodically or based on an event trigger to include new process data.

An interesting parallel
Semiconductor manufacturers have always tracked the outcome of in-line electrical testing on a lot basis (and occasionally at a wafer level) and reacted vigorously to any unexpected results. This reaction has included not only tracking down processing errors or culpable equipment, typically through some form of time-sliding technique, but also modeling product yield and performance as a function of in-line test results. The latter allowed characterization groups — the forerunner of yield management organizations — to tune the line and capitalize on rare bonanzas such as an overetch processing error that removed residual dielectric to serendipitously boost yield.

Similarly, since approximately the time of 0.5µm processing, organizations willing to drop previous misconceptions have implemented cohesive defect management strategies. These effectively put 16Mb and subsequent programs on a much faster yield learning track than would have otherwise been obtained, by following steps very analogous to what had previously occurred with in-line electrical testing. Hence, where characterization groups had evolved test plans, frequently building test structures in unused silicon to accommodate that end, defect groups created array defect monitors and devised inspection strategies. Where electrical characterization groups developed tests for specific failure modes (e.g., M1 to M1 shorts), defect management organizations engaged a review and classification strategy against some fraction of inspection results. This served to classify defects by appearance and location (e.g., metal 1 stringers), while establishing their relative frequency. Moreover, characterization and defect organizations, respectively, correlated test results to line performance, and calculated defect kill rates through analysis of bit fail maps, both to drive learning.

For those organizations astute enough to engage it, process fault treatment will follow a similar tack. Fault detection is entirely analogous to in-line electrical test and defect inspection, while fault classification — the key to learning — is exactly parallel to defect classification. Finally, correlation to wafer-level test results to generate kill rates will separate false positives from the real thing and drive tool learning.

Technology adoption
Metrology, being yet another floorspace-consuming, unit hour-inflating, throughput-constraining, non-value-added activity, has long been considered a dirty word by semiconductor manufacturers. However, the information provided is frequently vital, and as the stakes inherent in leading-edge manufacturing have steadily increased, so has manufacturers' willingness to invest. The size of that bet is driven by three primary factors:

the capital equipment cost of the factory,
the potential revenue/processed wafer, and
the risk of correctly processed wafers reaching test.

With 300mm manufacturing, all of those factors take a quantum jump, meaning that, contrary to previous circumstances, the manufacturing stakes and intricacy of process tooling should now be sufficient to support:

mechanized fault detection,
process fault classification to the point of identifying signatures, and
elimination of classified process faults through tool learning, potentially followed by automated self-correction measures such as adaptive control.

Moreover, since much of what needs to be done involves engineering applications work (making already existing sensors smarter) rather than substantial capital investment, this decision, at least, should be a relatively simple one for the industry.

Conclusion
Inevitably, making sensors smarter must happen where the data is. In essence, for the techniques made feasible by smarter sensors — fault detection and classification, kill rate correlation, and tool learning — to be deployed effectively, equipment suppliers will have to test tool-level mechanisms, while control engineering groups of semiconductor manufacturers drive the application of the technology. This will involve both adopting the outcome of equipment suppliers' work (if not implementing tool level mechanisms directly themselves), and experimentally engaging process-level (multivariate and generally post-process) methods within their own lines. As competitive technology coalesces in the marketplace and best of breed methods emerge, integration of disparate pieces such as chemometric and parallel coordinate methods will become appropriate, along with investigation of process fault correction technologies. Much of the sensing technology necessary to do this is effectively in hand, and only investment of time and intellect is required to yield capabilities in pace with (as opposed to trailing) semiconductor industry need.

Acknowledgments
Screen captures of Cornerstone 3.0 and Patterns 3.5 software used to illustrate this article were included with permission from Brooks Automation Inc., Chelmsford, MA. Screen captures of Curvaceous Visual Explorer for Windows used to illustrate this article were included with permission from Curvaceous Software, Buckinghamshire, UK.

The author gratefully acknowledges Brooks Automation, Curvaceous Software, and Eigenvector Research for allowing use of their products, as well as Lam Research and Scientific Systems for the invaluable provision of illustrative data. In particular, I would like to thank Peggy Bigelow, Robin Brooks, Jimmy Hosch, Justin Lawler, Tom Ni, Fred Terry, and Barry Wise for their effort and assistance.

References

R.H. McCafferty, "Etch Endpoint Detection Via Pattern Recognition," Proceedings of the Sematech AEC/APC Symposium XI, pp. 871-882, 1999.
B.M. Wise, N.B. Gallagher, "Process Monitoring and Fault Detection Using Multivariate Methods," Short Course Given in Association with Sematech AEC/APC Symposium XI, Vail, Colorado, Sept. 1999.

Robert H. McCafferty received bachelor's and masters degrees in mechanical engineering and a master of computer science degree from the University of Virginia. He worked for more than a decade in equipment and process control, including adaptive control implementation, at IBM, Burlington. He finished his career with IBM managing efforts to optimize circuit design against the effects of manufacturing variability. He consulted for a subsidiary of BBN, which eventually became part of Brooks Automation, specializing in semiconductor and pattern recognition assignments. He now practices independently. RHM Consulting, 6 Lone Oak Meadows, Sandy Hook, CT 06482; ph/fax 203/270-1626, e-mail [email protected].