The evolution of DRAM cell technology
05/01/1997
The evolution of DRAM cell technology
B. El-Kareh, IBM Microelectronics, Austin, Texas
G.B. Bronner, IBM Semiconductor Research and Development Center, Hopewell Junction, New York
S.E. Schuster, IBM Thomas Watson Research, Yorktown Heights, New York
The high density and low cost of DRAMs have earned them a predominant role in computer main memories. During the past 35 years, the number of DRAM bits/chip has increased by a factor of four every three years and the cost/bit has declined by roughly the same factor. This paper describes technological breakthroughs that allowed the design and optimization of high-density and cost-efficient DRAM cells.
Personal computers, workstations, communication systems, and graphic subsystems have seen rapid growth in the demand for larger and faster memories. These systems use memories to write, store, and retrieve large amounts of data. The data represent information or program instructions that are coded in combinations of the binary digits (bits) "1" and "0," and can be used by the microprocessor for further manipulation.
The main attributes of memory are cost, reliability, density, speed, and power. Throughout the years, data storage has used several memory technologies - among others, magnetic tape, hard disks, floppy disks, core memory, optical disks, and semiconductor memories. Magnetic and optical media, as well as some specially designed semiconductor ROM, belong to the nonvolatile memory (NVM) family: the information last written in those memories is retained "permanently," even after the power supply is removed. Volatile memories lose the stored information when the power is turned off. Some would lose the information if not "refreshed" periodically. RAM, which constitutes most semiconductor memory, allows any part of the memory to be read or written as fast as any other part. In contrast, serial access can be very slow, depending on the bit location that is being accessed.
Semiconductor memories were introduced in the late 1960s and early 1970s. They greatly increased storage density and speed, and reduced cost. The basic semiconductor RAM storage element is the cell, duplicated once for every bit. Each memory cell can store one bit of information, a "1" or a "0."
Semiconductor RAMs can be categorized in roughly three groups: nonvolatile (NVRAM), static (SRAM), and dynamic (DRAM), each of which has several variants (Fig. 1). No single set of memory characteristics satisfies all system needs. All three memory types are typically used in a computer. NVRAM is used for permanent storage of information or instructions that can be accessed in a ROM mode, such as the basic input/output system (BIOS) chips used for initialization of personal computers during startup. A SRAM cell stores data in a flipflop consisting of logic transistors. It retains its information as long as power is applied to the cell, but loses it when power is removed. A DRAM cell is small and consists of one transistor and one capacitor. The capacitor stores data and the transistor transfers data to and from the capacitor. Because the readout is destructive, the cell must be refreshed by immediately following every read operation with a write-back operation that re-writes the data that was just read. "Dynamic" means that the cell must also be refreshed periodically. Otherwise the cell would lose its information due to cell leakage in time.
|
Figure 1. Typical semiconductor memory cells: a) nonvolatile; b) six-device static (flipflop); c) one-device dynamic.
The internal processor speed improved by a factor of about 40 in 20 years, but despite innovations in technology and design, DRAM speed improved only by a factor of <2 (Fig. 2). Microprocessor clock frequencies presently run up to about 500MHz and may reach 1GHz by the end of the decade, while the fastest DRAM is only expected to reach 200 MHz data rate in this time period.
|
Figure 2. Comparison of microprocessor and RAM speed. (Adapted from B. Prince, High Performance Memories, John Wiley & Sons, New York, 1996.)
The mismatch between processor and RAM speed is more pronounced for DRAMs than for SRAMs because the cycle time of a DRAM is inherently longer. Devices in DRAMs are optimized for low leakage and not for speed, as in SRAMs and logic circuits. SRAM cache memory reduces the performance gap with processors. For the same lithography generation, SRAM memory can occupy 8-16 Y the space of DRAM. The present DRAM cost is about 0.1 millicent ($0.000001)/bit, declining at a rate of ~26%/generation. Since the cost of a die is inversely proportional to its size, SRAMs can cost more than eight times as much/bit as DRAMs. SRAMs are therefore used where speed is important, and DRAMs where cost-efficient storage of large amounts of data is required. A faster DRAM or cheaper SRAM could eliminate the need to use both parts in the same system [1].
|
Figure 3. How a DRAM cell works.
How did DRAM cell structures evolve from their first conception to their present 3-D, one-device cell structure? This paper describes the technological breakthroughs that contributed to their fast evolution, with the number of bits/chip increasing by a factor of four every three years, from 4 Kbit in the early 1970s to 1 Gbit by the end of the century.
How a DRAM cell works
A DRAM cell consists of a storage capacitor and a transfer device (MOSFET) that acts as a switch (Fig. 3). The presence of charge in the capacitor indicates a logical "1" and its absence a logical "0." Cells are arranged in arrays of rows (word lines or W/L) and columns (bit lines or B/L) that are orthogonal to each other. Multiple sub-arrays replace a single large array to shorten the word and bit lines and thereby reduce the time to access cells. For example, a 256-Mbit array typically consists of 16Y16 Mbit-sub-arrays. Word lines control the gates of transfer transistors and the bit lines are connected to sense amplifiers. One can consider adjacent bit line pairs as a single bit line folded in the middle, broken, and connected to a shared sense amplifier. In an alternative configuration, called open bit lines, sense amplifiers are placed between two sub-arrays, thus connecting each sense amplifier to one bit line in each array. Open bit lines result in more compact layouts but offer less noise immunity than folded bit lines. When a word line is selected, all transfer devices connected to the word line turn on and charge transfer occurs between the storage capacitors and the bit lines crossing the word line. Consider, for example, B/L and B/L in Fig. 4. Before a read operation, B/L and B/L are shorted and all bit line pairs are pre-charged to a voltage, Vb, halfway between the internal power supply voltage,VDD, and ground. To read the cell, the selected word line (W/L 0 in this example) is raised to VDD, turning on all transfer devices connected to the word line. This does not affect transfer devices that are not connected to the word line. Each sense amplifier detects the polarity of charge transfer by measuring the voltage difference, DVb, between B/L (charge transferred) and B/L (reference), and thus determines whether the cell stored a logic "1" or logic "0."
|
Figure 4. Illustration of a read operation with folded bit lines.
The signal created by charge transfer is very small (100-200 mV) because when the transfer device is turned on, charge redistributes between a small storage capacitor and the capacitance of the bit line. The latter can be more than >10 Y higher due to the large number of devices on a single bit line and other stray capacitances. The magnitude of the signal on the bit line, DVb, therefore depends on the ratio of storage capacitance to bit line capacitance:
|
where VDD = the internal power supply voltage, Cb = the bit line capacitance, and Cs = the storage node capacitance.
This behavior may be best understood by comparing charge to water and replacing the storage capacitor with a miniature water container and the switch with a water valve. The bit line would then play the role of a large water tank (see Fig. 3). To "read" the level of water in the miniature container, the water in the tank is first brought to a low reference level. Opening the valve will either increase or decrease the water level in the tank, depending on whether the miniature container was almost full or almost empty. The change in the tank water level can be very small, depending on the ratio of the water capacity of the miniature container to that of the tank. An appropriately chosen ratio can ensure detection of any change in water level.
Since VDD and Cb are reduced by about the same factor, Cs must be maintained above a critical value, typically 30-40 fF. The signal on the bit line is small and demands a sensitive amplifier. Also, the stored charge decays away quickly because of the inherent leakage of the cell. The cell retention time typically ranges from milliseconds to hundreds of milliseconds. A periodic refresh is therefore necessary to restore the charge before its level drops below a critical value where a "1" is indistinguishable from a "0."
Cell considerations
The minimum storage capacitance of the DRAM cell is determined by the sense amplifier sensitivity and signal-to-noise ratio, data retention, and single-event upsets (soft errors, i.e., nondestructive upsets) due to alpha particles or cosmic rays. It has remained around 30-40 fF/cell throughout the DRAM generations. When designing one-transistor-cell DRAMs, one must ensure adequate charge storage in the cell while maintaining a high array efficiency (array to chip area ratio), and sensitive enough sense amplifiers that are also sufficiently small to fit into the area left between columns. The stored charge in the cell can be increased by:
1. increasing the capacitor area,
2. thinning the node dielectric,
3. using higher dielectric constant material for the node dielectric,
4. increasing the junction capacitance (in early cell structures),
5. increasing the voltage on the gate of the transfer device, and
6. reducing cell leakage to increase the cell retention time (thus ensuring that adequate charge remains in the capacitor for the duration of a refresh cycle).
There are, however, constraints with these six approaches.
1. DRAMs are optimized for low cost and high yield, so there is a need to minimize the die size (and die cost) as the number of bits/chip increases. While lithography scaling alone has reduced the die area by a factor of 0.5 (0.7 linearly, i.e., 0.7 Y 0.7 = 0.49, ~0.5)/generation, the number of bits/chip has quadrupled during the same period. The technology has therefore focused on decreasing the cell area faster than allowed by lithography to avoid a substantial increase in die area, without decreasing the storage capacitance of the cell. This effort led to the development of three- dimensional cell capacitors, such as trench capacitors built in the silicon substrate, and stacked structures constructed above the silicon surface. With the development in these new structures, the resulting increase in die size is thus limited to only 40-50% every three years.
2. Primary constraints in reducing the node dielectric thickness include dielectric integrity, reliability, and leakage. For a given maximum voltage across the insulator, the dielectric must be thick enough to keep the electric field below a specified limit.
3. Materials with higher dielectric constants than silicon dioxide have been used where applicable. Examples include oxidized nitride (ON) and oxidized nitride/oxide (ONO), both presently used in DRAM cell capacitors; and tantalum oxide (Ta2O5) and barium strontium titanate (BaxSr1-xTiO3), both still in the research and development stage. Considerations such as ease of deposition, thermal stability, integrity, and reliability limit the choices of materials and flexibility in using them.
4. In planar structures of early DRAM generations, the junction capacitance in parallel to the dielectric capacitance was a sizable fraction of the total node capacitance. The node capacitance can be increased by raising the background dopant concentration in the junction, resulting in higher depletion capacitance. For a given voltage across the junction, however, the increase in dopant concentration is limited by impact ionization and tunneling effects that increase cell leakage.
5. The maximum voltage on the node capacitor is one threshold voltage below the voltage applied to the word line (source-follower configuration). Circuit techniques are therefore used to "boost" the voltage on the gate of the transfer device to a higher level than the externally applied voltage, compensating for the threshold voltage of the transfer device and allowing storage of full power supply voltage on the capacitor. Reliability considerations require increasing the MOSFET gate dielectric thickness when applying a boost to limit the field in the gate oxide to less than about 4Y106 V/cm. Increasing the gate oxide thickness of a MOSFET, however, reduces its speed, resulting in slower peripheral logic circuits and longer access time. This can be avoided by a more complex process with two different gate oxide thicknesses on the same chip, one for the array transfer device and a thinner oxide for logic circuits.
6. Finally, the cell capacitor leakage must be kept very small to allow longer retention times, reducing power dissipation and improving performance. Assume, for example, an initial charge of 2.5 V at the capacitor. As the node leaks, this voltage decays. If one allows a maximum decay of 1 V during a refresh cycle of 512 msec, then for a cell capacitance of 40 fF, the maximum node leakage may not exceed ~0.08 pA. Leakage components include subthreshold leakage, thermal generation of electron-hole pairs, impact ionization, and tunneling. The threshold voltage of the transfer device must be high enough to reduce subthreshold leakage. In some DRAM technologies, a potential increase in thermal generation, due to metal precipitates and crystal damage, prohibits direct metal deposition and ion implantation in the array area. An increase in the transfer device threshold voltage, though, necessitates a word line boost. Otherwise, the charge stored in the cell would be reduced considerably. The low leakage requirement imposes a delicate balance between unit processes used to fabricate the cell.
|
Figure 5. Trends in DRAM cell and die sizes.
Engineers have continuously optimized one or more of the above approaches within their constraints, and sought a trade-off between the conflicting requirements. The following sections describe breakthroughs in technology that allowed the design and optimization of high-density and cost-efficient DRAM cells.
Evolution of cell technology
During the past 15 years, DRAM cell size has decreased from 34 ?m2 for 1 Mbit to 0.25 ?m2 for 1 Gbit (Fig. 5). This rapid evolution may be best described in two stages: the era of the planar cell capacitor, from 16 KBit to 1 Mbit, and the era of 3-D cell capacitors, from 4 Mbit to 1 Gbit. Although all structures are truly 3-D, the term is used here to emphasize the extension of the cell capacitor above or below the silicon surface.
The Planar cell and its variants. The structure used for 16-KBit to 1-Mbit DRAMs in the early 1970s to the mid-1980s belongs to the planar one-device cell family. In this structure, the transfer device is an n-channel or p-channel MOSFET and the capacitor is placed horizontally along the side of the transfer device, typically occupying 30% of the cell area. CMOS was first introduced in peripheral circuits for 64-KBit DRAMs in 1983. Despite its added complexity, CMOS is attractive because of the reduced standby power and noise sensitivity, simplified logic design, and reduced number of transistors in the support circuits, increasing the array efficiency above 50%. Where p-channel transfer devices are used in a CMOS process, the array is typically placed in a large n-well that is also effective in reducing the soft-error rate (SER). The total node capacitance is the sum of insulator (MOS) capacitance between the n+ layer and the top capacitor plate, and depletion capacitance between the n+ layer and the substrate. The structure in Fig. 6 is a high-capacitance (High-C) implanted cell introduced to increase the junction capacitance/unit area [2]. Ion implantation was used to increase the boron concentration in the substrate beneath the n+ region, increasing the capacitance/unit area, without substantially increasing the threshold voltage of the transfer device. The advent of 4-Mbit DRAM in the mid 1980s met with limitations in this technique`s ability to maintain the node capacitance while reducing its area.
|
Figure 6. Typical planar cell used for 16 Kbit-1 Mbit: implanted high-C structure is introduced to the double poly structure to increase the junction capacitance/unit area.
Trench and stack capacitor cells and their variants. The arrival of 4-Mbit DRAM necessitated the introduction of three-dimensional cell structures to continue decreasing the cell size without reducing the effective capacitor area. One method to reduce the cell dimensions is to place the capacitor inside a trench. The other is to stack the capacitor above the silicon surface.
|
Table 1 compares the two cell types. They have been used successfully in the 4- to 256-Mbit generations and should be extendible to 1 Gbit and beyond. For most companies, manufacturing experience is the most important consideration in choosing one cell type or the other for a subsequent generation [3].
|
Table 2 shows the trend from 4 Mbit to 1 Gbit. "First generation" refers to initial design before shrinking the ground rules within the lithography capability. The array size increases from 4 Mbit to 1 Gbit by a factor of 256, while the cell size decreases by a factor of 40, and the die size increases by only a factor of ~5. The internal voltage is lowered consistently with dimensional scaling to limit the horizontal and vertical electric fields and to reduce the active power dissipation. Lowering cell leakage results in longer retention time. Increasing capacitor area and capacitance/unit area allows maintenance of a constant storage capacitance while scaling down the cell area.
While both cell structures use many common technology elements, they are fabricated differently. The two trench capacitor cell designs in Fig. 7 require deep trenches (~6-7 ?m) etched in silicon with nearly vertical sidewalls. The capacitor in Fig. 7a is formed between the isolated electrodes in the substrate and a common plate inside the trenches. In Fig. 7b, the isolated electrode is inside the trench and the common plate in the substrate. Because of the resulting large capacitor area, a "conventional" oxidized nitride (NO) node dielectric can be used without aggressively reducing its thickness. Processing NO is well understood and the insulator can sustain thermal cycles after trench processing. Another advantage is the high integrity and reliability of the dual dielectric. Trench etching and NO dielectrics are also scalable to 1-Gbit dimensions with conventional processing [4].
In the stacked capacitor cell, the capacitor, placed below or above the bit line, consists of a thin dielectric film between two poly layers (Fig. 8). Topography issues in the array, such as etching of high-aspect-ratio vias and depth-of-focus limitations restrict the increase in the capacitor area through extension of its height. Alternative ways have been used to increase the node capacitance while reducing the area occupied by the capacitor, without appreciably increasing the capacitor height. These include the use of fins, cylinders, and textured hemi-spherical grains (HSG) to increase the effective capacitor area; and high k dielectrics, such as tantalum oxide and barium strontium titanate, to increase the capacitance/unit area. These insulators are, however, still in development and are not as easily formed as thin oxide, NO, or ONO. They are not compatible with trench capacitors because of their limited ability to withstand high-temperature processing.
|
Figure 7. a) Original trench capacitor cell structure; b) buried strap trench capacitor cell for 64 Mbit to 256 Mbit.
|
Figure 8. a) Original stacked capacitor cell; b) stacked capacitor cell with capacitor over bit line.
DRAM as technology driver
DRAM is one of the microelectronics industry`s highest-volume parts, driving key segments of the manufacturing infrastructure. Because of its emphasis on high density, low cost, and longer retention time, the DRAM is the ideal product for setting a roadmap on minimum feature size, defect density, and material quality. Technology elements introduced to achieve these goals are equally essential for stacked and trench capacitor cells.
Isolation. For proper scaling, the isolation pitch in the array must match the wiring pitch of the bit line. For 256-Mbit DRAM, this means scaling isolation pitches down to 0.55-0.60 ?m, and for 1-Gbit DRAM using pitches of 0.35-0.40 ?m. Shallow trench isolation (STI) appears best to fit these needs. STI consists essentially of etching 0.20-0.50 ?m deep trenches, filling the trenches with oxide, and planarizing the surface by CMP. This process may be more complex than the conventional local oxidation of silicon (LOCOS) and some companies may lack the experience in CMP required for the efficient use of STI. LOCOS may, however, not be applicable at ground rules below 0.25 ?m [3].
Lithography. DRAM is the most aggressive driver of lithography because of its emphasis on density. Every DRAM generation begins by scaling lithography by a factor of 0.7 (Table 2). It is important to be able to use a given set of lithography tools for more than one technology generation. Even with phase-shift masking (PSM), though, i-line lithography used for feature sizes of 0.35 ?m and above in 4- to 64-Mbit DRAMs is not extendible to the 0.25 ?m ground rules required for 256 Mbit. DUV (248 nm) tools are therefore required for all critical levels in 256-Mbit DRAM. Recent work shows that advanced illumination techniques (off-axis, annular, or quadrupole) and phase-shift masks (attenuating PSM) allow printing of 0.18-?m features with 248-nm DUV for 1-Gbit DRAM with acceptable depth of focus (DOF), extending the lifetime of lithography tools to other generations [3].
Self-alignment and etching. Self-alignment has allowed shrinkage in horizontal cell dimensions. One example is the common bit line contact etched self aligned to adjacent word lines to reduce cell size. Complete etching of bit line contacts between tightly spaced word lines, without bit line to word line shorts, imposes strict requirements on etching and etch selectivity.
Planarization. The use of CMP to maintain a planarized surface throughout the process becomes more of an issue as one moves to sub-0.25-?m dimensions. Figure 9 shows the expected trends in allowable DOF for DRAM generations from 64 Mbit to 4 Gbit. The allowable device topography is the difference between the expected stepper DOF and the root-mean square of wafer flatness variation and stepper focus variation. The allowable topography that the integration scheme can introduce while still being able to print minimum images drops quickly (Fig. 9). Topography may become a problem for stacked capacitor cells, making it difficult to use tight pitch wiring in conjunction with the cell structure in 64-Mbit and subsequent generations [3].
|
Figure 9. Allowable device step as a function of DRAM generation.
Metallization and planarization. The back-end of the line (BEOL) has a number of challenges as one scales down to smaller dimensions. Topography introduced by memory cell integration causes problems for metal deposition (step coverage issues), metal etching (long overetch required), and DOF. In addition, one must face the standard BEOL issues, such as high-aspect-ratio gap filling with insulators, and reliability concerns. While cell arrays can be planarized by employing techniques such as reflowed doped glass (borophosphosilicate glass, or BPSG), the thermal budget for reflow becomes less compatible upon further scaling of dimensions [3]. CMP was first introduced for 4-Mbit DRAM and used in all subsequent generations [5].
One of the challenges is multilevel metallization (MLM). Most 256-Mbit DRAMs adopt two or three levels of wiring, with either tungsten or tungsten silicide bit lines and two levels of aluminum. While CMP planarizes the surface prior to metal deposition and etching, standard metal deposition, patterning, and subsequent insulator deposition all re-introduce topography associated with metal height and pitch. One can use additional CMP steps to re-establish planarity, but at much higher costs. An alternative integration scheme is the damascene interconnect process, whereby a trough is etched in SiO2, then metal is deposited to fill the trough and polished off the surface to leave a metal filled trough. Figure 10 shows the simultaneous formation of line and stud in the dual damascene version of this technique [5]. This process leaves a flat surface after the completion of every metal level. It also transforms the issue of filling high-aspect-ratio gaps between metal lines with oxide in conventional metal patterning to that of filling narrow grooves etched into oxide with metal in a damascene process - a more efficient process [3]. Collimated sputtering, metal reflow, and chemical vapor deposition make metals better suited to such high-aspect-ratio filling. Dual damascene processes have been demonstrated in 64-Mbit and 256-Mbit DRAM programs [5].
|
Figure 10. Dual damascene metal: a) via and trough patterning; b) via etch; c) trough opening; d) via and trough etch; e) metal deposition; and f) metal planarization.
Defects and yield. Accelerating yield learning and minimizing the time to market are crucial to recovering the large investments now required to establish a state-of-the-art manufacturing site. The capital investment is presently over $1 billion. Despite the rapid reduction in defect densities, only a small fraction of manufactured DRAM chips will have entirely perfect cells and peripheral circuits. If all dice with one or more defective cells were to be discarded, the resulting yield would be too low and the cost/chip prohibitively high. The effective yield will increase substantially by repairing memories with a limited number of defective cells, mostly using laser blown fuses. This is accomplished by adding a certain number of spare (redundant) rows and columns to the array and permanently substituting rows and columns containing defective cells with spare rows and columns [6]. In some cases, memory repair increases yield from <1 to >50%. With efficient redundancy schemes, the yield of DRAM chips is mostly limited by irreparable defects in peripheral circuits.
Conclusion
DRAMs play a key role in computers, workstations, communication systems, and graphic subsystems. DRAM technology also drives the manufacturing infrastructure for the electronic industry. During the past 35 years, rapid advances in technology allowed an increase in bits/chip by a factor of four every three years and a decline in cost/bit by roughly the same factor. This trend should continue in the next decade. A dramatic reduction in cell size is possible by introducing three-dimensional cell structures in conjunction with lithography scaling and advances in doping, etching, planarization, and multilevel metallization. These cell structures have been successfully used for 4-Mbit to 256-Mbit DRAMs and should be extendible to 1 Gbit and beyond.
References
1. B. Prince, Semiconductor Memories, John Wiley & Sons, New York, 1991.
2. C.G. Sodini, T.I. Kamins, "Enhanced Capacitor for One-Transistor Memory Cell," IEEE Trans. Electron Dev., ED-23, pp. 185-1190, 1976.
3. G.B. Bronner, "DRAM Technology Trends for 256 Mbit and Beyond," 1996 International Electron Devices and Materials Symposium, Symposium A, Hsinchu, Taiwan, pp. 75-82, Dec. 1996.
4. K.P. Muller, et al., "Trench Storage Node Technology for Gigabit DRAM Generations," 1996 IEDM Digest of technical Papers, pp. 507-510, 1996.
5. J.G. Ryan, R.M. Geffken, N.R. Poulin, and J.R. Paraszczak, "The Evolution of Interconnection Technology at IBM," J. Res. Dev., 39 (4), pp. 371-381, 1995.
6. S.E. Schuster, "Multiple Word/Bit Line Redundancy for Semiconductor Memories," IEEE J. Solid State Circuits, SC-13 (5), pp. 698-703, 1978.
BADIH EL-KAREH has been working on strategic technologies in microprocessors at IBM Microelectronics in Austin, TX, since July, 1996. Prior to that, he was a senior scientist at IBM, Hopewell Junction, NY. He is the author of a book on VLSI silicon device characterization and a book on modern processing technologies. IBM Microelectronics, ph 512/838-6723, fax 512/838-6486.
GARY BRONNER received his BS degree from Brown University and his MS and PhD degrees from Stanford University, all in electrical engineering. Since 1996, he has managed Process Integration for IBM`s 1-Gbit DRAM alliance. He is a senior member of the IEEE.
STANLEY SCHUSTER received his BS and MS degrees in electrical engineering from New York University in 1962 and 1969. He joined the
IBM Research Division in 1965. Currently, he is researching the design of microprocessors and novel memory architectures. Schuster is a Fellow of the IEEE and a member of the IBM Academy of Technology.