Using wavelet filtering for monitoring plasma conditions
11/01/2001
Byungwhan Kim, Sejong University, Seoul, Korea
Wookyung Choi, Jungang University, Seoul, Korea
overview
Plasma states of equipment can be monitored using a wavelet in conjunction with a radio-frequency impedance-match monitor system. This method helps prevent any failure in plasma during etch and deposition processing, which could critically diminish process performance.
Plasmas play a crucial role in etch and deposition for wafer processing. Since a failure in the plasma can critically diminish process performance, plasma states should be stringently monitored. Rather than external sensor data [1], it is more productive to monitor internal plasma variables that provide increased sensitivity and abundant diagnostic clues. Numerous in situ sensors have been used, such as optical emission spectroscopy, for example, for plasma diagnostics and control [2].
Figure 1. Real-time plasma monitoring system. |
An alternative plasma monitoring system was developed in which plasma states were detected by examining electrical positions of motors in the match network control system. This was primarily motivated by expected sensitivity of motors to variations in plasma impedance, since motors attempt to reach steady positions dependent on the magnitude of plasma impedance [3].
Unfortunately, this method did not provide sensitivity to all process variations. Particularly, there was extremely low sensitivity to variations in gas flows, thus limiting the applicability of the method. As a means to circumvent this sensitivity problem, we have looked at a wavelet-filtering technique and its application to radio frequency (rf) impedance match data for plasma diagnosis.
Figure 2. DWT patterns of match phase position as a function of source power. |
The wavelet is a family of functions whereby a signal is transformed in both time and frequency domains. Applying the wavelet to a signal yields a vector of filtered coefficients. Abnormal plasma conditions induced by variations in factors in equipment plasma can be detected by measuring the relative sensitivity of vectors.
Through experimentation, we optimized factors related to the wavelet and developed two metrics that characterize and compare both raw and wavelet filtered patterns as they apply to plasma diagnosis. (The principal author has also published significant work in plasma modeling for etch, see "Artificial neural networks for modeling plasma processes" on p. 76)
Acquisition of experimental data
The system used in our experiments generates a low-pressure planar plasma from an array of magnetic cores mounted on the top of a vacuum chamber (Fig. 1). A match network consists of three fixed capacitors (C1, C2, and C3), a variable vacuum capacitor (C4), and a variable transformer (represented in Fig. 1 as mutual inductance, M).
Figure 3. Total percent sensitivity as a function of DB type. |
The variable transformer is a primary coil with four turns and a secondary coil with three turns. Its action is achieved by varying the coupling of the mutual inductance between the two coils. As the primary coil is rotated inside the secondary by an impedance motor, its inductance is varied from zero to maximum coupling. Since the inductive reactance of the primary coil with more than four turns usually exceeds the output impedance of the generator, C1 is connected in series with the coil to balance it. The coil is rotated until the secondary circuit load coupled via M leads to a 50Ω impedance at the rf source. At the same time, the secondary coil is rotated using a phase motor to make the impedance purely resistive.
Figure 4. Total percent sensitivity as a function of decomposition level (L). |
To collect data, we interfaced a multifunction board (PCI-20428W-1) in a PC to a signal control panel (EASYDAS-5BP) off of the plasma control subsystem. The plasma monitor's I/Os communicate with the multifunction board's programming and data registers via an I3 bus. Monitored variables are displayed on the PC via a flow diagram created with Visual Designer software [4].
In our experiments, we varied several process factors, including source power, and argon and oxygen flow rates. Among the six variables that we monitored, two electrical positions of match motors were selected as representing certain abnormality in plasma states. These positions in each experiment were initially set to 5.94V and 6.80V for impedance and phase motors.
Basic theory of discrete wavelet
To fully comprehend our work and its application, it is important to understand the basic theory of a discrete wavelet. This is a time-frequency theory using a family of functions called wavelets to express or approximate a given function. In the transformation, a signal needs to be of length 2n and the filtering procedure is repeated n times, creating n levels of different scales, scaled with a factor of two. This family of wavelets is derived from one single function by the operations of dilations and translations, and possesses good localization performance in both time and frequency fields.
For f(t) ∈ L2 (R), f(t) can be approximated at different degrees of resolution in a hierarchical fashion [5] as given by
|
null
where the two basic functions ∅(t) are called a scaling function and φ(t) a mother wavelet. From the given wavelet and scaling function, wavelet transformation matrix (W) [6] is established. By applying the wavelet to a signal, a vector of filtered coefficients (C) is directly obtained as
null
The wavelet decomposes Cj (the vector of scale coefficients of f at level j in a coarser approximation at level j-m, m=2k+l. The selection of a suitable level for the hierarchy depends on the function to approximate. If W is an orthogonal wavelet, Y can be recovered by the inverse discrete transformation.
Measures of sensitivity
As an illustration, Daubechies (DB) wavelets [5] were applied to the steady position of phase match motors with variations in source power. The DB wavelets are compactly supported orthonormal wavelets (a DB family is typically written as DBN, where the N refers to the order). We noticed from previous work that the phase position provides improved sensitivity over impedance position for the same variations in process factors [3]. The steady data comprised 32 data points. Two variables DB function and level that govern the DB filter efficiency were set to the same value, giving us the results shown in Fig. 2.
Figure 2 shows that the vectors of filtered coefficients move down noticeably with an increase in power. Depending on their different combinations, various vectors of filtered coefficients can be obtained. This implies that both variables need to be optimized to improve position sensitivity to factor variation. To do this, we introduced two metrics: percent sensitivity (PS) and total percent sensitivity (TPS).
TPS is defined as
|
null
where j represents a factor (i.e., source power) and Min and Max are 500W and 1000W levels of j. The Δ j indicates a 100W increment of j. The PS(j+Δj,j) is the percent sensitivity of position as power and varies from j level to another j+Δj, which is defined as
|
null
where k denotes the total number of sampled data (32 in this study). The value x (i,j) represents the ith data element at the jth level of power.
Diagnostic sensitivity
To find an optimal set of our two variables, we computed TPS as a function of DB type. For each DB type, rf power was incrementally varied from 500W to 1000W. Figure 3 exhibits variations in TPS as a function of DB type. High TPS for a specific DB means that the DB is the most sensitive to variations in source power.
TPS appears to vary appreciably with DB type. For type four and below, the corresponding TPSs change more noticeably.
The best sensitivity (i.e., the highest TPS) is achieved at DB1.
After setting DB type to DB1, we examined variations of TPS as a function of decomposition level (Fig. 4).
Figure 5. Percent sensitivity as a function of source power. |
As depicted in Fig. 4, variations in TPSs are also considerable depending on the decomposition level. Since the original match data consisted of 32 elements equivalent to 25, the data can be decomposed at five different levels. Increasing the level in excess of five appears to degrade diagnostic sensitivity. In the end, we found our optimal set capable of yielding the highest diagnostic sensitivity at L5 and DB1. Using these optimal settings, we then applied the wavelet to filter phase position and computed percent sensitivity (PS [4]) to examine diagnostic sensitivity. Figure 5 shows our results compared with original match data. The plots in Fig. 5 show that wavelet filtered patterns yield better sensitivity than raw data in all five factor-to-factor variations in power levels.
Figure 6. Percent sensitivity as a function of argon flow rate. |
Our results with argon flows are given in Fig. 6. Here, the optimal set is DB7 and L4. In other words, the highest TPS was achieved as the DB wavelet when the order of seven was used while using decomposition level four. In nearly all cases, the wavelet-based PS demonstrates enhanced sensitivity over raw data. The efficacy of wavelet filtering is most significant as argon flow rate changes from 150 sccm to 170 sccm; here the disappearance of the raw data-based PS implies that variations in positions are insensitive to variations in argon flow in this range. In contrast, the wavelet-based PS is numerically high, whereby it is thus likely to identify a certain anomaly in plasma states incurred by a failure in argon flow rate.
Figure 7. Percent sensitivity as a function of oxygen flow rate. |
We observed the most noticeable efficacy of wavelet filtering when the PSs of wavelet filtered and raw data were compared as a function of oxygen flow rate (Fig. 7). Here, the optimal set is the DB8 and L4. The data in Fig. 7 show that PS for raw data appears only in the case when oxygen flow rate changes from 60 sccm to 70 sccm. This means that if just the raw data were monitored, it would rarely be possible to detect an oxygen-induced abnormal plasma state.
In all these examples, we have shown that the wavelet can be effectively used to enhance diagnostic sensitivity of monitored sensor data for process diagnosis.
Conclusion
A discrete wavelet transformation was applied to filter rf impedance match data for process monitoring and diagnosis. Factors such as the type of filter function and level were optimized for each equipment factor while characterizing both filtered and raw data with two diagnostic metrics, defined as PS and TPS. Comparisons with the raw data revealed that wavelet filtering yields an improvement in detecting plasma anomalies. Most significantly, the wavelet enabled identification of some anomalies induced by variations in factors, whose impact on positions, namely plasma states, are not as significant. It is thus expected that by wavelet filtering other types of in situ diagnostic data, plasma states can be monitored with improved sensitivity.
Acknowledgments
This work was supported by Korea Research Foundation Grant (KRF-2000-003-E00160).We thank W. Kim for his work involved with the accompanying sidebar.
References
- B. Kim, G.S. May, "Real-time Diagnosis of Semiconductor manufacturing equipment using a hybrid neural network expert system," IEEE Trans. Comp. Pack. Manufact. Technol., vol. 20, no. 1, Jan. 1997.
- J.O. Stevenson, et al., "A plasma process monitor/Control system," Surf. Interf. Analy., vol. 26, 124, 1998.
- B. Kim, C. Lee, "Monitoring plasma impedance match characteristics in a multipole inductively coupled plasma for process control," J Vac. Sci. Technol. A, vol. 18, no. 1, Jan./Feb. 2000.
- Visual Designer, Reference Manual, Intelligent Instrumentation.
- I. Daubechies, Ten Lectures on Wavelets, Philadelphia: Society for Industrial and Applied Mathematics, 1992.
- S.G. Mallat, "A theory for multiresolution signal decomposition: The wavelet representation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 7, 1989.
Byungwhan Kim received his BS and MS in electrical engineering from Korea University and his PhD from Georgia Institute of Technology. He is an assistant professor in electronic engineering at the Sejong University, 98, Goonja-Dong, Kwangjin-Gu, Seoul 143-747, Korea; ph 82/2-3408-3729, fax 82/2-3408-3329, [email protected].
Wookyung Choi received his BS in physics from Chonnam National University. He is an MS candidate in electrical engineering at Jungang University.
Artificial neural networks for modeling plasma processes
Rather than relying upon simulations, plasma-based IC processes are mostly developed through extensive experimentation. This stems from the difficulty of constructing models under highly complex particle dynamics within the plasma. Historically, plasmas have been modeled using first principle physics involving continuity, momentum balance, and energy balance inside a high frequency, high intensity electric or magnetic or both fields. Physical models attempt to derive self-consistent solutions to complex physical equations by means of computationally intensive numerical simulation methods, which typically produce distribution profiles of electrons and ions within the plasma.
Because physical understanding of plasma discharges is weak, physical models are often subject to many assumptions, leading to some discrepancy between predicted and actual behaviors. When a discrepancy becomes fairly large, optimization of equipment or processes is hindered.
To circumvent difficulties inherent in physical models, qualitative approaches have been developed, including statistical response surface model (RSM) and artificial neural network (ANN). The first application of ANN to plasma etch modeling demonstrated a prediction accuracy significantly improved over RSM [1]. Since then, the ANN has been contributing to understanding plasma processes by modeling a variety of plasma processes [2-4] and recently pure plasma discharge [5].
Artificial neural networks
Among ANNs, the back propagation neural network (BPNN) is the most widely used in plasma data modeling [6]. The learning ability of BPNN can be attributed to many parallel processing units ("neurons" crudely resembling human brain functionality). These are interconnected so knowledge is stored in the weights between neurons. Each neuron contains the weighted sum of its inputs filtered by an exponential sigmoid function, endowing the network with the ability to generalize with an added degree of freedom not available with RSM.
BPNN consists of one or more layers of neurons that receive (i.e., "input layers"), process ("hidden layers"), and transmit information ("output layers") regarding relationships between input factors and corresponding responses. The input layer of neurons receives external information and the output layer transmits it to the outside world. The hidden layers of neurons do not interact with the outside world, but perform classification and feature extraction tasks on information provided to input and output layers.
Development of a predictive model
A predictive BPNN model is constructed by training it with data prepared typically by a statistical experimental design. The BP algorithm by which the network is trained begins with a random set of weights (i.e., connection strengths between neurons). An input vector normalized so all input data lie between -1 and 1 is presented to the network. The output is calculated using the initial weight matrix. The calculated (or predicted) output is compared to actually measured output, and the squared difference between the two determines the system error.
In the BP algorithm, error is minimized via a gradient descent approach in which weights are adjusted in the direction of decreasing error. Conventionally, this rule is called the generalized delta rule [6]. By adjusting weighted connections recursively using the rule for all units in the network, accumulated error over all input vectors is minimized until it reaches a predefined tolerance limit.
Once a network is trained, model suitability is evaluated using test data.
Developing a BPNN model is complicated by the presence of many training factors whose optimal values are initially unknown. They may involve training tolerance, the number of hidden neurons, types and gradients of activation functions, and magnitude and distribution of initial weights [7]. Among them, the last factor is the most difficult to optimize since the prediction accuracy can be varied significantly in an unpredictable way depending on the random distribution of initial weights.
Thus, in most cases of modeling, models are largely built by optimizing only the first hidden neuron variable owing to its simplicity; this limits model prediction accuracy. Even in this case, however, models have rarely been optimized under the randomness in initial weights.
Apart from the main effect of each factor, various interactions among factors should also be optimized due to their significant impact on the prediction accuracy. In some of our recent work [7], we optimized factor effects among factors by implementing a genetic algorithm (GA) [8] to the models constructed on the basis of sampled best models.
Plasma etch modeling
We have used BPNN to model a plasma etch process. This involved collecting data from a plasma etch system using a Langmuir probe. Plasma characteristics modeled included electron density, electron temperature, and plasma potential.
We conducted a 24 factorial experiment to characterize relationships between process factors and plasma parameters. Factors varied included RF source power, process pressure, chuck holder position, and Cl2 flow rate. Training factors experimentally adjusted included training tolerance, gradient of bipolar sigmoid function, and magnitude of initial weights. Random effects of initial weights on prediction accuracy were examined by generating multiple models for a given set of training factors. The number of hidden neurons was set identically to 4. Each training factor was experimentally tuned until the BPNN model reached its best prediction accuracy.
For comparison, we also constructed RSMs and used a second-order polynomial. Using root mean squared error (RMSE), we measured prediction accuracy on our test data comprising eight experiments, not including the training data. Here, for example, our RMSEs for electron density were 0.321 x 1011/cm3 for BPNN and 0.973 x 1011/cm3 for RSM, illustrating a ~34% improvement of BPNN over RSM. (Factors optimized for the density model were 0.08 for tolerance, 1.90 for gradient, and ±1.4 for magnitude of initial weights. Similarly, for electron temperature and plasma potential our optimized models illustrated ~40% and ~52% improved prediction capabilities for BPNN compared to RSM.)
Practicality and considerations
ANN is a practical modeler that constructs a predictive model of experimentally characterized raw data. Compared to a physical model, an ANN model can be built without any assumption while adopting variations in process or equipment factors simply through "retraining." Such models can be built rapidly, contributing to rapid prototypes of plasma processes and equipment, as well as real-time diagnosis and control. ANNs can also improve insight about plasma discharges.
Developing an optimal neural network model, however, is complicated due to the presence of many training factors, particularly by the randomness in initial weights. In most previous work, this aspect has been neglected due to the absence of a way to identify optimal initial weights. Another drawback common to published works is that models were constructed without taking into account interaction effects among factors. In this sense, the ANN models developed are somewhat limited in the prediction accuracy. From our previous work, we believe that by adopting an optimization technique such as GA, limitations can partly be circumvented.
References
- C. Himmel, et al., "A Comparison of Statistically-based and Neural Network Models of Plasma Etch Behavior," Proc. 4th Internat'l Semi. Manufact. Sci. Symp., pp. 124-129, 1992.
- B. Kim, G.S. May, "Reactive ion etch modeling using neural networks and simulated annealing," IEEE Trans. Comp. Pack. Manufact. Technol., vol. 19, no. 1, pp. 3-8, 1996.
- B. Kim, et al., "Characterizing metal-masked silica etch process in a CHF3/CF4 inductively coupled plasma," J. Vac. Sci. Technol., A, vol. 17, no. 5, pp. 2593-2597, 1999.
- B. Kim, et al., "Use of neural networks to model low temperature tungsten etch characteristics in high density SF6 plasma," J. Vac. Sci. Technol., A, vol. 18, no. 2, pp. 417-422, 2000.
- B. Kim, G.T. Park, "Modeling plasma equipment using neural networks," IEEE Trans-Plasma Science, vol. 29, no. 1, 8-12, 2001.
- D.E. Rummelhart, J.L. McClelland, Parallel Distributed Processing, Cambridge, M.I.T. Press, 1986.
- B. Kim, S. Park, "An optimal neural network plasma model: a case study," Chemometr. Intell. Lab. Syst., vol. 56, no. 1, pp. 39-50, 2001.
- D.E. Goldbeg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison Wesley, 1989.