Issue

Plasma etch rates: The impact of mask transmittance

08/01/2001

Massimiliano Pindo, DuPont Photomasks Inc., Round Rock, Texas

overview
The relationship between plasma etch rates and the area to be etched is described as a "loading effect." This can be theoretically explained and is typically taken into account for both fixed-time and endpoint-etch processes. In particular, transmittance-related mask features determine the size of the exposed areas, thus directly contributing to the loading effect both globally (macroloading) and locally (microloading). This article discusses the impact of mask characteristics on plasma-etching performance when endpoint detection is used. Also discussed is how mask suppliers can perform transmittance characterization by properly extracting information from standard layout data.

Plasma systems with endpoint detection are typically used to etch polysilicon and metal [1]. Etch rate depends on the exposed area ("loading effect"). If this dependence is neglected, severe physical variations of nominal etching conditions may occur at the wafer level, resulting in poor statistical process control of minimum-size patterns. Ultimately, this can lead to yield loss. The loading effect is therefore of fundamental importance in wafer manufacturing. Because the exposed area is determined by the photolithography-masking step preceding the etching process, this article will focus on mask characteristics, in particular, mask transmittance.

Figure 1. A typical reticle layout made of three columns and four rows of dice.Click here to enlarge image

In fine tuning the etch recipes to achieve the best process performance, the evaluation of the wafer exposed area is a key issue, which leads to the necessary evaluation of transmittance-related mask characteristics. A typical example is offered by polysilicon etch steps, wherein the same recipe may lead to polysilicon under- or over-etch in devices featuring different mask transmittances. This can be better understood, for instance, by thinking of the intrinsically different pattern density of a wafer containing DRAMs as opposed to one containing microprocessors.

Figure 2. a) A wafer-saturating map with stepper shots covering the whole available surface, which patterns the dice to the wafer edge. This is the result of a stepper tool with a relatively large field (2x). b) A wafer-nonsaturating map with fewer stepper shots reaching the wafer edge. This is the result of a stepper tool with a small exposure field (4x, 5x). A stepper field that is one-fourth of the one shown in Fig. 2a is assumed.Click here to enlarge image

Because of its quadratic dependence on the wafer radius, the loading effect must also be taken into account in preparing the transition to larger-diameter wafers. In fact, assuming the same pattern density and minimum size technology, the film quantity to be etched will linearly increase with the wafer area and the physical/chemical process conditions inside the etching reactor will be much different. This is only a first-order effect, a macroloading effect. Second-order effects such as microloading and aspect ratio dependence may also be present if the transition to larger wafer diameters is accompanied by a reduction in minimum feature size [2].

The goal of this article is to discuss the impact of local and global mask transmittance characteristics on wafer manufacturing. As a general guideline, it is useful to keep in mind that the Fourier transform of the mask transmittance determines the exposure light diffraction pattern impinging on a wafer, which is critical in determining lithography performance.

Mask transmittance and macroloading
In this discussion, e-beam and laser-written quartz/chrome binary masks are assumed to be perfectly repairable, making local transmittance imperfections close to repair areas insignificant. The same applies to perfectly phase-shifted masks (PSM) because the phase-shifted areas on the mask (typically MoSi), although not covered by chrome, are equivalent to opaque regions.

Clear-field masks, used at polysilicon and metal levels, define device features in the chrome remaining on the mask. Conversely, dark-field masks, typically used for contacts and vias, define openings by removing chrome from the mask. In both cases, the overall reticle area, normalized when divided by the total reticle exposure field, gives the mask transmittance, which is directly related to the portion of the wafer surface to be etched.

The transmittance evaluation is based on layout software tools that evaluate the percentage of the total area occupied by the polygons drawn at each level. Clear fields are "reversed" for maskmaking purposes so that, as in the case of dark fields, polygons drawn on a particular layout level comprise the total written mask area, its glass surface.

Most reticles are generated by processing (fracturing) two files: a device file and a frame file for each assigned mask level (Fig. 1). The software code provides the total glass area for both device (GAD) and frame (GAF). Ancillary patterns used for mask fabrication and alignment can be neglected because they are not printed on the wafer.

Any reticle is composed of one or several arrayed frames made by scribe streets and filled with die. Its transmittance, T, is, in either case, equivalent to the transmittance of a single frame filled with die. For example, let us suppose there are M columns times N rows of die in each frame. Each die carries a transparent area GAD, which is multiplied by M*N for the whole frame. The total scribe street clear area, including contributions of test and ancillary structures, is given by GAF. Summing up and dividing by the total frame area, SF, reticle transmission is given by

T = [GAF + (M*N)GAD]/SF

Geometry shrinks will also impact transmittance. According to wafer-dicing tool tolerances, die spacing and scribe street widths must be a multiple of tens of microns. The frame windows, filled with die, are necessarily bigger than the die size and are thus forced to have sides that are also multiples of tens of microns. On the other hand, geometry shrinks are usually expressed in microns or fractions of microns, obtained by fracturing layouts with 10, 15, or even 20% reduced address units (reduced design spot sizes). This results in blank space or a superimposition of the die edge and scribe streets. In this case, evaluating the reticle transmittance is more complex. While transmittance can still be performed according to the above equation, mask polarity and die/scribe separation or overlap must be factored in.

Exposure fields
The evaluation of reticle transmittance is only the first step needed to complete the calculation for determining the total area to be etched. It is also necessary to consider the repetition of exposure fields on a wafer. During exposure, a number of stepper "shots" cover the wafer area, the number of shots being a function of the exposure field size and of the wafer diameter. Room must also be provided for wafer identification, and coverage is dictated in part by die-mapping requirements (Fig. 2a, 2b).

However, is the given reticle transmittance, as evaluated above, equal to the wafer exposed area (WEA) obtained by the exposure shots? What happens if some stepper fields are incomplete because of die mapping rules and/or rounded shapes? A reasonable answer is WEA = T.

This would be true if shot maps were fully contained on the wafer surface. In fact, in this case, the wafer area not covered by the stepper shots is known and is usually "dark," and does not contribute to WEA. Nevertheless, microloading edge effects can still be present, as will be discussed.

For wafer surface "saturating" maps that totally fill the wafer surface with die, WEA = T is a suitable first-order approximation. Wafer-saturating maps are often found in ASIC processing where one usually tries to exploit the entire wafer surface for economic reasons. In this case, the close-to-the-edge transmittance strictly depends on the way the scribe street is truncated by the wafer edge.

Usually, the shot map is centered, symmetrical with respect to both x (parallel to the flat) and y (orthogonal to the flat) axes. Because of the two axial symmetries, any incomplete shot has a counterpart both in the left
ight and top/bottom directions. Moreover, a third symmetry, central with respect to the map center, obtained by composing an x symmetry with a y, is also a valid symmetry for the shot map. Therefore, to avoid clear pattern nonhomogeneity at the wafer level, one should try to reproduce at the frame level a symmetrical clear area distribution along the x, y axes or the frame center. However, the symmetry around the x axis is unavoidably "broken" by the flat; therefore, even in this case, the conclusion WEA = T is valid only to first- or second-order approximations. Wafer flat elimination would provide an advantage not only in limiting the impact of global loading effects but also in simplifying its quantitative evaluation to first-order considerations based on pattern placement symmetry.

Microloading effects within scribe streets
Because of the specificity of each pattern put in the scribe streets for wafer process and measure, even in the case of clear area symmetrical distribution, microloading effects may still be present, impacting other ancillary structures and neighboring dice. One solution is designing structures, properly positioned in the scribe streets, to monitor such effects.

Figure 3. a) Grid is 505µm along x and 740µm along y. b) Grid is 101µm along x and 140µm along y.Click here to enlarge image

The precise determination of microloading effects is quite difficult and is strongly case-dependent. A clear area distribution function, fC(x, y), can be introduced, with the coordinates x and y varying on the whole extension of the scribe streets. Typically, no analytical expression for fC(x, y) is available. Therefore, domain partition — segmentation in smaller rectangular cells composing the scribe street area — is needed. The clear area distribution fC(x, y) is evaluated at the center of each rectangle and is assumed to be constant inside. The size of the rectangles into which the domain has been divided should be the smallest allowed by computational limits to best evaluate the average local transmittance ratio.

Loading effects depend on the difference between the average clear area fraction within the rectangle and the average transmittance of the mask. Actually, the clear area fraction, globally evaluated for the whole mask, determines the duration of the etch process by changing the reactor endpoint detection time [3-5]. The impact of the clear area distribution on local etch performance can be easily seen. The endpoint time is uniquely set for each wafer and, necessarily, is the same if a wafer is uniformly covered with the average clear area distribution. If, locally on a wafer, more material should be etched because of a larger-than-average clear area density, a local under-etch will likely occur because the endpoint time is set too early, and vice versa.

Microloading effects inside a device
The considerations described above can be applied to any pattern appearing on the mask and, therefore, also, to each die placed inside the frame windows.

To illustrate microloading effects inside a device, a mask for a thick Al/Si/Cu metal layer is used to exploit its strong discrimination between a CMOS device and the rest of the circuit [6]. The full device area, 4040 x 5180µm2, has been partitioned according to two different grids: one made of 8 x 7 cells of 505µm x 740µm size and a second consisting of 40 x 37 cells, each one 101µm x 140µm large.

The extension of the clear area inside any box has been automatically evaluated using a software code written in C-language, which scans the whole device surface with the fracturing tool. The 2-D scalar functions representing the clear area percentage in both cases are shown in Fig. 3a and 3b, respectively.

It can be seen that large differences arise in both cases when comparing the CMOS portion of the circuit, where power metal line content is very low and transmittance values high, and the power portion. This means that microloading effects can be expected to be as severe for mixed technologies as for pure CMOS submicron technologies, even for different reasons.

Figure 4. The grid is the same as in Fig. 3b; the function scale unit is, however, 10% instead of 20%.Click here to enlarge image

Note that the finer the grid, the better the mask local transmittance features are known. In fact, as expected, the finer grid of Fig. 3b provides more information than Fig. 3a. Moreover, from Fig. 3b, one may also guess the transmittance function to be quasicontinuous with no abrupt change from 0-100%. Actually, if the function scale of Fig. 3b is made twice as fine, as shown in Fig. 4, wherever the starting point is, one has to pass through all the intermediate colors in order to reach regions corresponding to the lowest and highest transmittance values. In fact, abrupt discontinuities would only be encountered if a grid was used with a scale at least the same order of magnitude (or smaller) than the minimum feature size/spacing of the layer under consideration (in this case 5µm). This fine-grain grid process would lead to a dramatic increase in the number of boxes and would be extremely CPU-time consuming. Fully digitized information would be the output. The transmittance would jump from 0 to 1, with abrupt discontinuities when passing through neighboring boxes along any path enclosed in the device area. Nevertheless, this corresponds to the digitalization of the layout itself and there is no real information in it. It is essentially equivalent to the binary file used to write the mask. On the other hand, a coarse mesh would not provide useful information. Given the trade-offs, a proper grid cell size must be determined.

Determining grid size
The following is a method for determining grid size:

Experimentally determine a threshold corresponding to the maximum difference of clear area percentage that can be tolerated to preserve device functionality and reliability by a given etching recipe. This threshold value is dependent on the process, on the mask level, and on the layout. For volume manufacturing, it is extremely important to limit the number of available recipes to simplify operations. Test wafers with properly designed isolated/dense patterns can be run to determine the threshold value.
In analyzing a new product mask, the starting mesh size is dependent on the threshold value as determined in step 1. It should be kept as large as possible, but if the recipe is sensitive to very small variations of the exposed areas, grids have to be made fine from the beginning. For example, a grid whose cell area is 505 x 740µm2 is already large enough because a 60% clear area variation is experimentally found to be tolerated by the thick metal etching recipe, and such a transmittance difference has already been determined with this grid for some pairs of adjacent cells (see Fig. 3a).
For a fixed grid size, compute the clear area percentage for all the boxes. If there are at least a couple of cells whose clear area percentage difference is bigger than the fixed threshold value determined at step 1, stop; otherwise, make the grid finer and compute until the difference is larger than the threshold.
As soon as the grid size potentially showing microloading effects has been found, the process stops. A "microloading distance DµL" can be defined as the distance between the centers of the two closest cells showing a clear area percentage difference greater than the assigned threshold. A device showing a very short DµL is very sensitive to microloading effects and vice versa. Clearly, this analysis can be done before running wafers in the line, thus allowing for preventive actions at design and/or process steps.

Plasma etching
When a thick metal layer is etched, like the one under consideration, devices may show large clear area variations. For the specific product under consideration, the robust maximum difference threshold of 60% was found with a large grid. In fact, DµL is equal to 740µm, the size of the grid cell along the y axis. This means that microloading is not likely to occur. This is somewhat expected because the etching recipe of such a thick layer must ensure reduced CD differences on wafer, ultimately the reason for the robust 60% threshold value.

Metal strip profile can still vary across the wafer [7], however, and local transmittance features close to the layout digitalization limit may still play an important role. More precisely, the strip profile is expected to be more positive (bottom wider than the top) in device regions where less film has to be etched (low-transmittance or "closed" areas). When compared to high-transmittance areas ("open" areas), the formation of polymers created by the chemical reactions with the photoresist is, in closed areas, locally enhanced. Polymers accumulate at the bottom of the strip, thus making its cross section trapezoidal instead of rectangular.

Therefore, even in the case of noncritical layers, the use of "tiling" techniques during layout preparation is recommended. Extensive use of dummy features not interfering with device functionality reduces clear area distribution variations, thus smoothing microloading/profile effects and making wafer manufacturing more robust. DµL maps can immediately show where, how many, and which tiling structures are needed.

Conclusion
Wafer manufacturing at endpoint-regulated dry etch steps can be improved by full characterization and/or timely preparation of the associated mask transmittance features. Mask suppliers can also benefit from mask transmittance evaluation when faced with endpoint-controlled plasma etch processes such as binary masks processed with dry etch and PSM manufacturing. Because of the cost of high-end masks and wafers, making such preventive checks before mask writing or wafer-lot launching would save considerable time and money.

Acknowledgments
The author warmly thanks A. Campora and R. Ruffoni of STMicroelectronics (Agrate, Italy) for helpful discussions.

References

C.Y. Chang, S.M. Sze, ULSI Technology, McGraw-Hill, p. 362.
S. Tandon, Semiconductor International, p. 75, March 1998.
W.S. Ruska, Microelectronic Processing, McGraw-Hill, p. 221.
C.J. Mogab, J. Electr. Soc., Vol. 124, p. 1262, 1977.
B. Chapman, Glow Discharge Process, John Wiley & Sons, p. 341.
A. Andreini et al., in Smart Power ICs: Technologies and Applications, eds. B. Murari, F. Bertotti, G.A. Vignola, Springer.
W.N.G. Hitchon, Plasma Processes for Semiconductor Fabrication, Cambridge University Press, p.104.

Massimiliano Pindo received his PhD in elementary particle physics from the University of Milan. He worked on advanced semiconductor detectors at the European Center of Nuclear Research (CERN, Geneva), and was a device engineer at STMicroelectronics, working on photomask procurement. Pindo is account manager in Italy for DuPont Photomasks Inc., 131 Old Settlers Blvd., Round Rock, TX 78664; ph 512/310-6500, fax 512/255-9627.