Issue



Nonlinear models used to address epi layer uniformity


07/01/2004







Engineers in semiconductor fabs depend on trial-and-error experimentation to improve thickness uniformity, a quality measurement for epitaxial layers grown by CVD. New linear modeling techniques reported in the last 10–12 years offer a tremendous advantage in quantitatively describing the effects of process variables on the deposition-layer thickness profile, and have been used successfully for many processes. In the case described here, the nonlinear technique turned out to be an order-of-magnitude more accurate than linear models.

Epitaxial silicon wafers are commonly used as substrates for processing discrete electronic devices, ICs, and microelectromechanical devices — and chemical vapor deposition (CVD) is used to deposit the epitaxial layer. Depending on the application, the thickness of the epitaxial layer can vary from <1µm to >100µm. In some cases, it is of prime importance to minimize thickness variations in the epitaxial layer, including both wafer-to-wafer and within-wafer variations. Such structures are processed in horizontal single-wafer reactors that have sufficient stability and adjustability to minimize thickness variations. For moderately thick layers, atmospheric processing at temperatures of 1100–1150∞C is used with H2 as the carrier gas and trichlorosilane (TCS) as the source gas. Such processes achieve relatively high throughputs with deposition rates of 3.5–5µm/min.

The growth rate of the epitaxial layer is affected by the temperature and the concentration of TCS in the carrier gas and, because of the high process temperature, is primarily limited by the diffusion of silicon-containing species through the boundary layer in the gas stream. The boundary layer thickness is a function of the gas velocity. As a result, the deposition rate changes if the carrier-gas flow rate is changed, even if the TCS mole fraction is kept constant. Additionally, the growth rate varies strongly along the direction of the flow. Typically, the local growth rate may be 25%–35% higher at the leading edge compared to the trailing edge of the wafer.


Figure 1. A schematic diagram a) top view and b) side view of the CVD reactor. The H2 + TCS gas mixture is fed into the growth chamber through five adjustable injectors (I1–I5). The silicon wafer rests on a SiC-coated susceptor that is rotated at 35–45rpm to ensure a rotationally symmetric thickness distribution.
Click here to enlarge image

To achieve good thickness uniformity, the wafer is rotated during the deposition. As a consequence, the thickness distribution is radially symmetric. Small adjustments to the radial thickness profile can be made by changing the temperature profile of the wafer. For robust processes, however, the adjustment is done mainly by changing the lateral velocity profile of the gas stream, which also changes the boundary layer thickness and the local deposition rate in the lateral direction. For this velocity profile control, the reactor used (ASM Epsilon) was fitted with five adjustable injectors. In practice, there are only three adjustable parameters, because the flow pattern is usually kept as symmetric as possible to avoid possible backstreaming effects in the growth chamber. This means that injector 1 has the same gap as injector 5, and likewise injectors 2 and 4. Using the nomenclature of Fig. 1, I1 = I5, I2 = I4.

Modeling CVD processes

Physical modeling of CVD is relatively complicated and time-consuming even for the simplest geometries of the reactor, and conventional statistical techniques of empirical modeling are linear and do not perform well. (For reference, mathematical modeling of CVD processes has been reviewed by Komiyama et al [1].)

The modeling of CVD processes was traditionally done by computational fluid dynamics in the 1980s, and empirical modeling was frowned upon, for good reasons. In those days, computing power had just reached the level where solving several partial differential equations in space and time had become feasible. Finite element methods were relatively new, especially when used for realistic problems. On the other hand, because empirical modeling did not have the benefit of the new techniques based on artificial neural networks, it usually did not work well.

While nature does not follow the simplicity of a linear approach, the proponents of linear techniques draw on their simplicity and the possibility of adding nonlinear terms in linear regression. Use of nonlinear terms often is not done, however, and even if it is, the approximation is not efficient.

Today, nonlinear empirical modeling is a far more viable approach for process improvements than physical modeling. To do so, one can use new techniques of nonlinear modeling, such as artificial neural networks, which have the so-called universal approximation capability [2] that makes them suitable for most function approximation tasks encountered in process industries. The user does not need to know the type and severity of nonlinearities while developing the models.

Artificial neural networks

Structurally, and to a smaller extent functionally, artificial neural networks resemble the networks of neurons in biological systems. Like the networks of neurons in the brain, artificial neural networks consist of neurons in layers directionally connected to others in the adjacent layers (Fig. 2).


Figure 2. A typical feedforward neural network with one hidden layer.
Click here to enlarge image

Many types of neural networks have been in use in process industries for about ten years, often for process control and product development — and other practical uses [3, 4]. The multilayer perceptron, a kind of a feedforward neural network, is the most common; most neural network applications in industries [5–13] are based on it. Nonlinear modeling can also be done in a number of other ways.

In a feedforward neural network, the output of each neuron i is given by

Click here to enlarge image

null

where the activation function is usually the logistic sigmoid, given by

Click here to enlarge image

null

The incoming signals to the neuron are xj, and wij are the weight for each connection from the incoming signals to the ith neuron. The wi0 terms are called biases. This results in a set of algebraic equations that relate the input variables to the output variables. So for each observation (a set of input and output variables), the outputs can be predicted from these equations based on a given set of weights. The training procedure aims at determining the weights that result in the smallest sum of squares of prediction errors.

There are a variety of training methods in use today. Back-propagation used to be the most common training method about ten years ago. Today, good optimization methods, like the Levenberg-Marquardt method [14–16], are most often used.

Comparing linear and nonlinear modeling

Nonlinear modeling is empirical or semi-empirical modeling that takes into account at least some of the nonlinearities. To approximate the nonlinearities as best as possible from a limited amount of data, or from data very dilute in information content, it is often desirable to combine the neural network approach with other empirical modeling, or parts of physical models.

Experiments were carried out with different conditions that resulted in 59 observations of the thickness profile. NLS 031 software was used to develop the nonlinear models for predicting the thickness profile.


Figure 3. A comparison of measured and predicted values of thickness at the edge from the nonlinear model. (All thicknesses measured in microns.)
Click here to enlarge image

Figure 3 shows a comparison of measured values of the deposition layer thickness at the edge and values predicted from a nonlinear model; the results of the model on thickness at the center are better. In contrast, Fig. 4 shows the results from a linear regression model with the same input and output variables.


Figure 4. A comparison of measured and predicted values of thickness from the linear regression model. (All thicknesses measured in microns.)
Click here to enlarge image

Often linear and nonlinear models have similar accuracies. Nonlinear models should be at least as accurate as linear models and are often better in many respects, but the improvement may be small in magnitude. For this data set, there is a visible difference between the linear and nonlinear models.


Figure 5. A comparison of error measures for linear and nonlinear models.
Click here to enlarge image

Figure 5 shows the results of modeling the deposition layer thicknesses at three locations. The error variances of the linear model for each of the three thicknesses are an order-of-magnitude larger than those of the nonlinear model.


Figure 6. Effect of the opening of injector 4 on thickness at the edge, keeping other injector gaps, temperature, and flow rates constant. (All thicknesses measured in microns.)
Click here to enlarge image

Figure 6 shows the effect of injector gap 4 on the thickness at the edge, while other input variables are kept constant. The LUMET system concept, developed by Nonlinear Solutions Oy over a period of several years, is a framework in which nonlinear models may be implemented for use. Since nonlinear models are relatively complicated and unwieldy for algebraic manipulations, it is necessary to provide them in a framework that makes it possible for engineers to use the models without a detailed understanding of how they work. This concept has been used in a variety of applications — from steel rods to fiberoptic cables — and each looks different and often serves different purposes. Using the LUMET concept, it is possible to determine process variables that satisfy the desired conditions, if such a solution is possible. In the table, only injector gaps 3, 4, and 5 are allowed to vary within the limits shown, and the thickness is desired to be as uniform as possible. The third column shows a solution for the injector gaps — the best it could find.

Conclusion

Click here to enlarge image

Because even relatively large nonlinear models may take only a fraction of a second to compute outputs, they are generally much faster than physical modeling; physical models involving differential equations take much longer to compute the variables of interest. All this comes at a price, however: nonlinear modeling is complicated for the average engineer. The equations can be unwieldy and cumbersome to reorganize, containing several parameters that are difficult to interpret. It requires a good amount of experience and expertise to be able to develop even relatively simple nonlinear models suitable for industrial use. Nonlinear model development requires appropriate data, which is often difficult to produce, particularly when the underlying nonlinearities are not well known or when experimentation is expensive. Development also takes quite a bit of time — even a month is a relatively short time for development of industrial models — making it more expensive.

Results, however, show that the benefits are often well worth the costs because unlike physical modeling, which is based on the laws of physics and simplifying assumptions, nonlinear modeling represents reality without making assumptions, and has a much higher capacity to describe the real world than the usually rather simplistic equations used in physical models. In other words, nonlinear models are often better models because they contribute to tighter process control and superior products.

Acknowledgment

Epsilon is a trademark of ASM International NV.

References

  1. H. Komiyama, Y. Shimogaki, Y. Egashira, "Chemical Reaction Engineering in the Design of CVD Reactors," Chem. Eng. Sci., Vol. 54, No. 13–14, pp. 1941–1957, July 1999.
  2. K. Hornik, M. Stinchcombe, H. White, "Multilayer Feedforward Networks are Universal Approximators," Neural Networks, Vol. 2, No. 5, pp. 359–366, 1989.
  3. A. Bulsari, ed., Neural Networks for Chemical Engineers, Elsevier, Amsterdam, Netherlands, 1995.
  4. A. Bulsari, "Quality of Nonlinear Modelling in Process Industries," Internal Report NLS/1998/2.
  5. A. Bulsari, M. Lahti, "Nonlinear Modelling Secondary Coating from Expensive Experimental Data," Proc. International Wire and Cable Symposium, pp. 302–305, Nov. 2001.
  6. A. Bulsari, M. Lahti, "Nonlinear Models Guide Secondary Coating of OFCs," Wire and Cable Technology International, Vol. 29, No. 5, pp. 40–43, Sept. 2001.
  7. A. Bulsari, J. Fredriksson, T. Lehtinen, "Neural Networks for Quality Control in the Wire Rod Industry," Wire Industry, Vol. 67, pp. 253–258, March 2000.
  8. A. Bulsari, P. Hooli, "More Accurate Alloying with Neural Networks," Stainless Steel World, Vol. 12, pp. 54–57, Nov. 2000.
  9. A. Bulsari, J. Fredriksson, T. Lehtinen, "Uuden Sukupolven Laatujärjestelmät Sisältävät Epälineaarisia Malleja," Vuoriteollisuus, No. 1, pp. 38–41, 1999.
  10. P. Myllykoski, A. Bulsari, "Selection of Influential Variables for Modelling Cold Rolling of Thin Sheets," Proc. EANN, pp. 155–158, 1997.
  11. A. Bulsari, A. Käppi, "Prediction of Compressive Strength and Compaction Degree of Concrete," Proc. EANN, pp. 181–184, 1998.
  12. A. Bulsari, M. Lahti, "Optimising Secondary Coating of OFCs with Nonlinear Models," Wire and Cable Technology International, Vol. 30, No. 5, pp. 44–46, Sept. 2002.
  13. A. Bulsari, P. Pitkänen, B. Malm, "Nonlinear Modelling Paves the Way to Bespoke Polymers," British Plastics and Rubber, No. 12/02, pp. 4–5, Dec. 2002.
  14. P.E. Gill, W. Murray, M.H. Wright, Practical Optimisation, Academic Press, London, pp. 136–140, 1981.
  15. K. Levenberg, "A Method for the Solution of Certain Nonlinear Problems in Least Squares," Quart. J. Appl. Math., Vol. 2, pp. 164–168, 1944.
  16. D.W. Marquardt, "An Algorithm for Least-squares Estimation of Nonlinear Parameters," J. Soc. Indust. Appl. Math., Vol. 11, pp. 431–441, June 1963.

Abhay Bulsari received his bachelors from the Indian Institute of Technology in Mumbai, and his PhD from the U. of Virginia. He may be reached at Nonlinear Solutions Oy, PL 953, 20101 Turku 10, Finland; ph 358/2-215-4721, e-mail [email protected].

Veli-Matti Airaksinen received his MSc in technical physics at the Helsinki U. of Technology and his PhD at Glasgow U. He is director of the Micronova Microelectronics Centre at the Helsinki U. of Technology; he co-authored this article while at Okmetic Oyj in Vantaa, Finland. He may be reached at Micronova, P.O. Box 3500, 02015 HUT (Espoo), Finland; ph 358/9-451-6075, e-mail [email protected].