Issue

How to make a sensor smarter

10/01/2000

Part I — Technology

Robert H. McCafferty, RHM Consulting, Sandy Hook, Connecticut

overview
Typical wafer fabs take advantage of only a small portion of the information contained in the large quantity of metrology data that they generate. By looking at the behavior and interaction of parameters over time in new ways, much more real-time information about the process and equipment can be derived. Part 1 of this article reviews some of these tools, which can be used to improve processes and productivity.

For those with data and the will to decode it, there are few unknowns. From arrhythmic heartbeats to ringing control loops and etch endpoint traces, a time varying signal is carrying far more information in the pattern of its behavior than raw magnitude or engineering value alone.

In nearly all such instances, a relatively unsophisticated sensor is transmitting data that is subsequently converted into engineering units, but as much (or more) information is to be garnered by analysis of signal behavior over time. For example, a chamber pressure or laser trace signal may indicate mean free path or reacting surface reflectance, but patterns in their temporal behavior finger an oscillating control loop or pinpoint vertical position within a film stack.

Hence, the techniques of creating a "virtual sensor" from behavior of one or more signals renders garden-variety sensors not simply measuring devices but recognition instruments for real-time events. Those events reflect the operational condition of process tooling and the process state itself. The information garnered by careful real-time and post-process signal analysis can therefore be used as a rather intelligent form of process control itself.

Figure 1. Optical emission spectrometer trace at 438nm for oxide etch. Click here to enlarge image

Sensors in the wafer fab
Semiconductor fabrication in the year 2000 is blessed with — or perhaps beset by — a wealth of measurement capability. At an equipment level, this runs the gamut of sophistication from simple positional sensors (throttle valve angle, shut-off valve setting, etc.) to feedback control transducers servicing regulation of power, pressure, flow, and other key process parameters to complex and expensive systems such as optical emissions spectrometers (OES). Fortunately, the first two sensor categories (positional and regulatory) are effectively free since they come with any piece of functional equipment. Additional, and frequently invaluable, external sensing systems are boltons exhibiting varying degrees of integration within capital equipment. In any event, they produce a cornucopia of numbers, with each instrument effectively converting some facet of tool, reaction, or process behavior into an unending stream of numerical data. The occasionally bewildering (and typically under-utilized) data stream can be priceless, but it is more often simply an ever-expanding stream of raw numbers rather than information. Inherently, in their native forms, the instruments involved cannot yield the latter or draw conclusions beyond the boundaries of their programming.

Virtual sensors do a better job of this, effectively delivering tool state information by design, reaction state information through calculation, and wafer state information via inference (or, occasionally, limited observation). Being a conjugate of one or more sensors, these are able to make deductions based on an array of inputs. In essence, virtual sensors are a "system of sensors," intended to encapsulate and mimic some facet of human intelligence. Specific examples of these abound in niches of the marketplace, ranging from etch endpoint systems driven by residual gas analyzers and OES systems, to purely software systems producing, by varying means, some measure of "equipment health."

Since creation of virtual sensors requires only that one wring out of a pile of numbers information useful to running a semiconductor factory (or any component therein), we are limited, in essence, only by imagination. Fortunately, over the past two decades the US semiconductor industry has proven to be imaginative. Consequently, there are a number of well-established, albeit unevenly applied, tools to make our sensors smarter, including:

first principles knowledge at a reaction level,
thorough process characterization,
times series pattern recognition,
parallel coordinate analysis, and
chemometric analysis.

First principles
There is no substitute for a process engineer's having some grip on what goes on inside the reactor. In any application, cognizance of a process' underlying chemistry and physics will reap far greater dividends than lavish investment in software or metrology equipment. The revelations returned from this may be simple — such as not installing steppers next to a bank of windows in the Texas sun or letting chemical reagents for epitaxial growth sit around indefinitely before use — but they frequently preclude innumerable headaches.

On a more technical note, the OES trace of Fig. 1 illustrates behavior of SiF emissions (measured at 438nm) throughout silicon dioxide etching. As a knowledgeable process engineer can tell you, this trace spikes at plasma ignition, drops quickly to a reasonably stable level throughout much of the etch, then tails off somewhat further to an endpoint as the etched oxide film clears. The sharp rise beyond the endpoint occurs as a large area of photoresist eventually etches away, leaving the plasma chemistry free to attack unprotected silicon-bearing material beneath the resist.

Figure 2. Fundamental voltage trace for polysilicon etch from a plasma impedance monitor. Click here to enlarge image

Equally useful is the behavior of a fundamental plasma voltage trace (Fig. 2) produced by a plasma impedance monitor during the course of polysilicon etching. Breakthrough, bulk, and overetch steps can clearly be seen, with the latter providing another endpoint decision cue, since fundamental voltage stabilizes to a fixed level only as the etched polysilicon film clears. In both cases, real-time data will tell those knowledgeable of reaction physics not only exactly when the process is complete, but also with approximately what uniformity the wafers are cleared. Such knowledge is fundamental to multiplying the utility of sensor data.

Process characterization
Beyond placing comprehension of reactor level chemistry and physics on a numerical plane and serving as the key to optimized process development, fine-grained characterization work can additionally bring considerable control advantage. For run-by-run work — where a process recipe is adjusted (within limits) based on a fixed model, with product measurement information fed forward as well as back — purely statistical modeling is satisfactory. To be of utility in real time, however, one must essentially crack the underlying, differential equation code of what makes a process tick. Academically approached on a generic level, that can be an exorbitantly tall order. Restricted to a particular reactor (e.g., for etching), chemistry, and set of target films, however, the problem reduces to identification of governing class parameters followed by empirical modeling in their terms. Hence, previously reported characterization work for silicon dioxide etching in an IBM reactor fit the bill for enabling real-time control for the particular tool, film, and chemistry [1].

Figure 3. Dimensionless parameter plot of silicon dioxide etching.Click here to enlarge image

Essentially, the high correlation coefficient log-linear dimensionless parameter relationship illustrated by Fig. 3 allows a sophisticated real-time control scheme to practice the process equivalent of inertial navigation. In effect, by tracking the ratio of incident RF power (PI) divided by chamber pressure (P) and reactant total flowrate (Ft), one can monitor — or control — an oxide etching reaction as it occurs at the wafer's surface. When combined with information from other instruments and control techniques, such as those associated with endpoint detection, this is an extraordinarily powerful method.

Pattern recognition
A key to getting answers in real time is an ability to recognize algorithmically and react appropriately to features or patterns apparent in signal data from modern manufacturing tooling, all the while conducting any useful calculations. The objectives are to drive sophisticated forms of process control, find specific fault signatures, and check conformance of an overall signal profile to its expected norm. Discriminant-based techniques conduct this by extracting a vector of characteristic pattern features, with future recognition by location of a pattern's feature vector in feature space. To accomplish this via the syntactic means discussed here, which represent an intricate pattern through hierarchical decomposition into simpler sub-patterns, individual signals are parsed against templates created with a small lexicon of primitive shapes, shown in Fig. 4.

Figure 4. A collection of generic features that might be found in a time series plot. Click here to enlarge image

The recognized library of shapes, each of which is tailored to match a test shape of interest by statistical calculation of primitive fitting parameters, can consequently be combined to match signals of arbitrary complexity and profile. Hence real-time or post-process parsing amounts to associating each shape-enumerating segment of a template with a corresponding stretch of signal data, and then reporting a match (of the entire template) if, and only if, all specified shapes are found. Figure 5 illustrates such a scenario, where the recognition engine has been instructed to report on matching of all individual shapes, which are consequently outlined within separate boxes. In addition to facilitating process control decisions, such capability is immensely useful when interrogating large bodies of archival data for specific (typically fault) signatures.

Figure 5. Shape recognition via syntactic methods for a complex, repetitive signal. Click here to enlarge image

Parallel coordinates
One neat trick to analyze large bodies of post-process summary data rapidly and visually arose from the IBM Scientific Center in Santa Monica, CA, during the mid-'80s. By plotting all dimensions — i.e., all parameters — of a dataset on parallel axes rather than attempting to display them on conventional Cartesian coordinates, one can see at a glance all values associated with a particular process run (Fig. 6) [2]. The only limitation is how many axes will fit across the width of one screen. One observation becomes a series of line segments connecting points on the parallel coordinate axes into a single, continuous, polygonal line. These polygonal lines essentially define a channel where the process has operated before. By engaging the analytical facilities of Curvaceous Visual Explorer (CVE), the system used to generate Fig. 6, a query on observations between the red markers can be executed. The system narrows that channel to only those lots exhibiting both good yield and desirable speed sort. The optimization capabilities of such a system are obvious, particularly when one realizes that if lot-level observations are plotted chronologically from right (where wafers start) to left (final test), then any particular lot can be steered from operation to operation as it clears various measurement gates in line. Even if this approach is not used to "yield learn" in that fashion, "wild duck" observations are always immediately obvious — making parallel coordinate techniques highly effective in detecting off-pattern events and therefore a desirable tool for multivariate SPC.

Chemometrics
As its name suggests, chemometrics deals generally with handling of measurement data in the field of chemistry. Specifically, it engages mathematical and statistical methods to deduce state information regarding a chemical system from the set of all measurements taken. Hence, for the purposes of making smarter sensors useful in semiconductor manufacturing, this is the purely mathematical technique of wringing event information from piles of numbers. As shown in Fig. 7, this is done by transforming (after various scaling and mean centering operations) multidimensional data into principal components — orthogonal linear combinations of the original variables — then modeling principal component relationships and discerning (ideally graphically) deviations from the norm.

Figure 6. Parallel coordinate plot of VLSI fabrication data. Note the "black hole" in parameter X15.Click here to enlarge image

Such deviations take two forms, those that are excessive in magnitude but still within the pattern of modeling, and those stemming from observations to which the model simply has no relevance. This technique formed a mainstay of the effort in Sematech's J88 project for etch fault detection and classification. In that context, events detected after equipment maintenance — a change to the system — should be reflected in large Q variation, while faults from intentionally injected measurement bias (e.g., in etchant flowrate) exhibit large T2. In either event, from a process perspective, something unusual has happened; and in uncovering that, chemometric analysis has done its job, even though what happened remains an open question. That answer is often readily found, however, by discerning the principal component furthest from the bounds of normal behavior and examining its loadings (contribution from each original variable) to deduce likely candidates for investigation.

Technology advantages and economic benefit
Leveraging the "smarter sensor" pieces described above brings a spectrum of technical advantages to any organization adventurous enough to make the investment. Key among these is an ability to run reactions in a fashion that achieves exactly their desired wafer state, thereby largely negating the variability of incoming film conditions and allowing a fab to carry out processes that otherwise could not be reliably executed. Detecting precursors of tool failure allows their interception before equipment issues develop and product is compromised. Hence, maintenance for a still viable tool can be scheduled at a time most advantageous for the factory rather than on a "tool-down" basis. Scrap risk due to process faults can be reduced to the frequency of detection, with the latter running on both a detailed, univariate level (real-time or post-process), and on a more holistic, process level (generally executed only post-process). Multivariate detected faults can then be narrowed to likely equipment issues by either principal component loadings (contributions), inspection, or parallel coordinate analysis, while all equipment issues (whether detected by univariate or multivariate methods) can be subjected to fault signature recognition. This, in turn, serves as both an identification (what to fix) and classification (type of fault) mechanism, with correlation of fault classifications to wafer test results, yielding kill rates and consequently driving tool learning. The latter is crucial for the industry to advance its equipment technology, rather than simply reacting to reported "equipment health" events.

Figure 7. Relationship between principal component (PC) model and unusual observations. Click here to enlarge image

Considering depreciation alone, without accounting for consumables, floor space, and power, semiconductor equipment costs run at approximately $25/hour/million dollars of capitalized book value simply for idle tooling. Since idle tools can neither add value to wafers nor inadvertently render them stone dead, however, the equation changes considerably when product is introduced into equipment. Recent work at the MIT Sloan School of Management suggests (rather strongly) that the burn rate of equipment that runs with a vengeance can be excruciatingly high [3]. Tools that process wafers in an apparently correct fashion but in fact create functionality issues that evade in-line testing, defect inspection, and metrology and then must be diagnosed at final test can easily cost the organization $10,000/minute! Such analysis, moreover, was done before the advent of 300mm processing. Since $100,000 200mm wafers are not unheard of and $100,000 300mm wafers will clearly not be uncommon, the dollar advantage to smarter sensors that catch problems at (or before) their inception may soon be no less economically significant than the technology advantage of superior process control.

Conclusion
The cost of continuing to run a process in the wafer fab after it has gone bad is enormous. Real-time diagnosis of problems in the process or equipment is therefore a very valuable capability. Tools such as time series shape recognition, parallel coordinate plots, and chemometric analysis can create this ability by extracting more information from sensors than their raw data reveals. n

Acknowledgments
Screen captures of Cornerstone 3.0 and Patterns 3.5 software used to illustrate this article were included with permission from Brooks Automation Inc. Screen captures of Curvaceous Visual Explorer for Windows used to illustrate this article were included with permission from Curvaceous Software. The diagram used in Fig. 7 was included with permission of Eigenvector Research Inc. The author also gratefully acknowledges Lam Research and Scientific Systems for the invaluable provision of illustrative data. In particular, he would like to thank Peggy Bigelow, Robin Brooks, Jimmy Hosch, Justin Lawler, Tom Ni, Fred Terry, and Barry Wise for effort and assistance without which this work would not have been possible.

References

R.H. McCafferty, "Dimensionless Parameters of Reactive Ion Etching," 39th Electronic Components Conference Proceedings, p. 754, 1989.
E.W. Bassett, "IBM's IBM Fix," Industrial Computing, Vol. 14, No. 4, p. 24, 1995.
C. Weber, V Sankaran, G. Scher, K. Tobin, "Quantifying the Value of Ownership of Yield Analysis Technologies," presented at ASMC99, Boston, Mass., Sept. 1999.

Robert McCafferty earned his bachelors and masters degrees in mechanical engineering and a master of computer science degree at the University of Virginia. He worked over a decade in equipment and process control — including adaptive control implementation — at IBM, Burlington. He finished his career with IBM managing efforts to optimize circuit design against the effects of manufacturing variability. He consulted for a subsidiary of Bolt, Beranek, and Newman (BBN), which eventually became part of Brooks Automation, specializing in semiconductor and pattern recognition assignments. He now practices independently. RHM Consulting, 6 Lone Oak Meadows, Sandy Hook, CT 06482; ph/fax 203/270-1626, e-mail [email protected].