Issue

Leakage reduction in SOCs using gate-length biasing

09/01/2006

Leakage control in hand-held and wireless devices is a challenge that increases with IC technology nodes. After the foundry engineers the transistor, designers have much of the control over system-on-chip (SOC) leakage during the design flow from architecture to GDS. Qualcomm employed a novel method of addressing leakage-developed by Blaze DFM and enabled by Chartered Semiconductor as the foundry partner-that improved die yield by 20% on a 90nm SOC chip. Optimized designs have significant strategic and financial value, particularly for leading-edge ultra-low power mobile products in the early stages of development and production ramp.

Sorin Dobre, Ke Cao, Matt Severson, Charlie Matar, Omer Sheikh, Qualcomm CDMA Technologies, San Diego, California

Leakage power in sub-100nm processes strongly affects battery life, package cost, and leakage (Iddq) sort yield for designs targeted at mobile applications. The reduction of leakage power is a critical design challenge. A Qualcomm mobile baseband processor SOC was optimized for leakage using a new methodology in a triple-threshold voltage (triple-Vt), 90nm low-leakage foundry process.

Qualcomm deployed a new design automation tool, Blaze MO from Blaze DFM, that uses small gate-length biases (typically up to 10% of drawn CD) to reduce leakage in sub-100nm low-power designs by 10%-40%. The biasing of the transistor gates is performed during optical proximity correction (OPC) mask data preparation (MDP) by the foundry using a special annotation layer that is added to the GDSII/Oasis database. This OPC biasing and optical rules check (ORC) verification is qualified and endorsed by our foundry partners.

The GDSII annotation layers contain rectangular shapes that overlap the gate region of polysilicon, and an annotation box extends 20nm on each side of the gate. The optimized GDSII will contain a few extra layers corresponding to the annotation layers; the original design layout will be unaltered. Two directives are supported by our foundry’s OPC flow:

CD biasing. Transistors’ gate lengths are adjusted to improve timing and/or leakage power. The foundry’s allowed range of motion of CD biases is between -4nm and +6nm in increments of 2nm for this particular 90nm low-power process.

OPC error biasing. The sign (±) of error of OPC algorithms is allowed to change on a per-device basis, which influences the starting point of the iterative OPC solution search. A positive error-bias is likely to result in a larger CD while a negative bias is likely to result in a smaller CD. Every variant of a library cell has two versions corresponding to the two variations of OPC error bias.

The foundry made some important but simple adjustments to their OPC scripts in order to properly interpret the new annotation layers. This took them a few days and need only be done once per process. It is not necessary to change the OPC scripts on a design-by-design basis.

The optimization process required zero incremental wall-clock time. On the design side, the optimization is performed concurrently with physical verification. On the manufacturing side, OPC does not run any longer than it would have on an un-optimized design.

These transforms typically reduce leakage variability by 25%-40% for two reasons. First, due to the exponential nature of the gate-length/Ioff tradeoff, an increased gate-length inherently reduces leakage variability. Second, Vt variability is inversely proportional to the square root of device area. Global gate-length bias optimization with incremental timing and SI analysis ensures that timing constraints for the design are honored during optimization. Simultaneous improvements in timing and leakage are possible because of the nonlinear Ion-Ioff tradeoff with gate-length biasing.

We start with a library of prequalified cell variants with different transistor gate lengths, and model the power/performance characteristics of the cells after biasing. The selection of cell variants is performed with respect to the process window, design rules, and SPICE models, as validated by the foundry. The new biasing tool automatically identifies cell variants that lead to maximum leakage optimization within a given library size or cell characterization budget, while the layout of the cells does not change and nothing is added to the library.

From a design team’s perspective, we now have a fine-grain method of optimization that is orthogonal and additive with respect to existing leakage power reduction techniques such as three Vt settings as well as different gate drive strengths (1×, 2×, 4×, etc.). Each of these “moves” is comparatively coarse-grained, such that changing from one Vt to the next will typically result in a 30% speed change and a 5× leakage change. By contrast, applying a +2nm CD bias might imply a 2% speed change and a 25% leakage savings. Essentially, this new optimizer can find slack that is left over by the existing “coarse” optimizations, and can flexibly convert timing slack to leakage power savings.

Leakage optimization

Qualcomm must meet an extremely competitive spec for any device it provides for cellular phones. Functionality, talk time, and standby time are budgeted; the chip cannot exceed budget. Power consumption is critical, and power specs are tight. For the validation of this new DFM technique, Qualcomm chose to optimize a baseband processor, the “CPU of the phone,” and integrate DSPs, embedded processor and controllers, and additional cores for video and graphics (Fig. 1).

Figure 1. Top-level layout of baseband processor SOC.Click here to enlarge image

For baseband processors, leakage power is a concern not only in standby modes, but also in active modes when many devices are leaking and circuit-architectural design techniques cannot help. Only multi-Vt and gate-length modulation techniques can help with talk time.

Enablement. Foundry enablement and project support were provided by Chartered Semiconductor Manufacturing and the Common Platform Alliance (Chartered, IBM, and Samsung) in the form of 1) a pre-qualified CD biasing window (±6nm in the Common Platform’s 90LP process), 2) modified OPC scripts and DRC/LVS decks, provided by Chartered, to recognize Blaze’s use of GDSII annotation layers as a means of passing design-driven CD biasing to the foundry, and 3) all mask and wafer processing, measurement, and test.

The foundry enablement process included two essential qualification steps: SPICE model-to-silicon correlation, and RET flow validation. For SPICE model-to-silicon correlation, we worked with Blaze engineers and our foundry partner to develop a Test Element Group (TEG) for measuring the effect of small changes in gate length on device I-V characteristics at multiple temperatures. We also used this TEG to verify that Ion/Ioff variability was not increased by biasing. Along with the TEG, we created layout structures that enabled us to assess the integrity of post-OPC gate shapes at each level of gate bias to ensure lack of distortion. Finally, we used this TEG to validate that bias annotations were correctly implemented through the foundry RET flow.

Design. The baseband processor SOC occupies approximately 50 mm2 of die area, and contains nine major macro blocks and approximately 550K standard-cell instances at the top level. Major blocks include an ARM9 core, multiple DSP cores, and video and graphics processing cores (Fig. 1). Signoff for the design is performed with PrimeTime SI; the Blaze tool does not require any change in this signoff during its optimization.

Figure 2. Average leakage current at 1.2V/25°C before and after individual gate-length optimization throughout ~70% of the SOC (arbitrary units).Click here to enlarge image

Several variant libraries were characterized to enable a rich set of simulation experiments with a variety of degrees of freedom and objectives during optimization. Only positive biases (increased gate lengths) for leakage optimization were used. The library variant characterization time was reduced to ~2 days by characterizing a limited set of variants for combinational cells only. The restriction to only combinational cells left approximately 30% of the leakage reduction potential (for sequential elements) unavailable for optimization. The only difference between the original design and the optimized design is the poly mask; all the other mask layers remain unchanged.

Results

An “A-B reticle” experiment was performed in which copies of the original taped-out design and the optimized design were manufactured side-by-side on the same wafer. Silicon measurements from four wafers containing several thousand total die confirmed significant benefits of this optimization. At the 1.2V/25°C (voltage/temperature) corner, the average leakage was reduced by 20%, the leakage variability by 28%, and the sort yield improved by 20 points at the given Iddq and Fmax sort criteria. The optimization did not lower functional yield. Simulated average leakage current at the same corner was reduced by 17.4%. Silicon measurements at the same corner showed a 20.4% reduction, supporting the underlying SPICE model accuracy and confirming the simulated results. Simulated leakage savings reached 26.7% at the 1.2V/125°C corner with more aggressive optimization.

Figure 3. Percentage of die meeting spec as FMAX_1 is swept, showing improvement in parametric yield due to optimized gate-length biasing.Click here to enlarge image

Figure 3 shows the yield improvement after gate length optimization as a key Fmax test is swept. Yield has been normalized to an arbitrary value in this figure; however, the normalized yield gain corresponds to approximately 20 points of actual yield. The vertical dotted line represents the actual threshold frequency of the Fmax test, again normalized to arbitrary units. The pass criteria also included various Iddq threshold criteria at 1.2V/25°C and 1.35V/25°C, as well as two other Fmax tests.

The design is moved off of the Iddq vs. Fmax tradeoff curve that is available for foundry-side exploitation, because each transistor is individually moved along the Ion vs. Ioff curve based on its respective performance requirement in the design. There is no way for the foundry to make this adjustment without this information. With these new annotations, adjustments can easily be handled by the foundry during OPC, leading to a much better match between design and process for the device.

The prospects for CD biasing as a technique for leakage and variability reduction remain strong going into the 65nm and 45nm technology nodes. At 65nm, a qualified CD biasing range of 6nm has been established by most of Qualcomm’s foundry partners. Coupled with the steeper Lgate vs. Ioff curve in 65nm technology, we expect much stronger results than at the 90nm node. At 45nm, multi-Vt disappears since reduced supply voltages do not leave enough headroom, so gate length modulation is the only leakage reduction technique available.

Conclusion

With the Blaze-modified database and one new poly mask, a baseband SOC can achieve 20% reduction of full-chip leakage (at room temperature) and 30% variability reduction. Designers can use an annotated poly layer to shorten yield ramp time. The RTL or gate level netlist is not affected at all and no new IP libraries or cores are needed. In addition, this new tool has significant value as a solution to process variation. While we were able to achieve these improvements using Blaze MO software on specific chips, the software may produce different results for different chips in different process nodes.

Additional accomplishments of the Qualcomm-Blaze-Chartered project include the successful definition of a tapeout flow for the Blaze methodology, as well as definition of the Blaze MO design kit (GDSII layer map, qualified CD biasing range, GDSII annotation methodology, modified DRC/LVS decks, foundry-specific configuration file, etc.). With this groundwork, any foundry can supply a design kit to enable this proprietary CD biasing methodology, and indeed, design kits have been issued for 65LP processes at TSMC and the Common Platform Alliance.

While there is some up-front work that needs to be done by the foundry engineers, updating the OPC scripts and the design kits to account for the annotation layers, the benefits far outweigh the costs in being able to offer their customers a superior process without any additional equipment expenditures. Moreover, the gate CD modulation methodology offers value both early in the process lifetime (15-20% shift of median Iddq) and at process maturity (reduced variability). For the particular design we studied, the biggest win comes from the jump from x% to x+20% parametric yield-a step-function gain along the yield learning curve. This result also implies far faster ramps to yield maturity.

Acknowledgements

Blaze MO is a trademark of Blaze DFM Inc. PrimeTime is a registered trademark of Synopsys Inc.

Sorin Dobre received his BSEE and MSEE degrees from Politehnica U., Bucharest, Romania, and is staff engineer at Qualcomm CDMA Technologies, 5775 Morehouse Dr., San Diego, CA 92121-1714; ph 858/651-8568, e-mail [email protected].

Ke Cao received his MSEE from the U. of Minnesota and is a DFM engineer responsible for methodology and development at Qualcomm CDMA Technologies.

Matt Severson received his MSEE from Brigham Young U. and is staff engineer and the low-power lead for 7th-generation MSMs at Qualcomm CDMA Technologies.

Charlie Matar received his BSEE from the U. of Texas at Austin and his MSEE from Southern Methodist U., Dallas, TX, and is director of engineering for SOC design at Qualcomm CDMA Technologies.

Omer Sheikh received his BSEE from UC Berkeley and is senior engineer working on ASIC designs at Qualcomm CDMA Technologies.

Nanoscale product challenges: Leakage variation, yield, and yield ramp

Leakage variation. As the semiconductor industry moves into volume production at 90nm, and starts up the 65nm ramp, leakage-current and variability pose serious challenges to device performance and threaten to drastically increase the time-to-volume ramp of new products designed in these nodes. With delays in high-k gate dielectrics and nonplanar device architectures, device speed improvements at each successive technology node come with thinner and leakier gate oxides, and lower threshold voltages that are much more vulnerable to process variability. Further, the inability to control device gate length leads to unworkable levels of variability of leakage current, since there is an inverse exponential relationship between gate CD and subthreshold leakage.

Yield. As a fabless company, Qualcomm designs its chips with a very aggressive yield objective across the voltage and temperature process window. A die either meets spec and can be sold, or fails and cannot be sold. A given die contributes to yield if it passes all functional and parametric tests. Functional tests check for short- and open-faults typically induced by “random defectivity.” Parametric tests check for die that are functional but do not meet performance specifications embodied in frequency (Fmax) and leakage (Iddq) tests. In a mature process, the Iddq yield loss should be small (<1%) but with process shifts (e.g., a slight variation in Ion) and poly-layer re-spins (e.g., tweaking of the SRAM bit cell) this yield loss can easily increase (2-3%).

Yield ramp. High-volume consumer devices are among the first products to venture into a new process technology, as a result of market pressures to reduce costs and enable additional functionality. The move to 90nm has proven to be difficult for new, high-performance products. Much of the speed of the new process had to be sacrificed in order to meet the leakage requirement, and to cut variation between fast and slow parts. The designers and the foundry work together to position the product chip along an “Ion vs. Ioff curve” that is intrinsic and fixed for the process, in such a way that the product chip has adequate yield.

The yield ramp is lengthened by design bug fixes that require re-spins; for example, a new poly mask may be made with an OPC recipe different from the original, causing a partial reset of the parametric yield-related process ramp. Indeed, current 90nm products have experienced 12-month ramp times, and there is no sign that ramp times for 65nm will be shorter. Hence, any “design for manufacturability” (DFM) technique that can reduce average leakage power and leakage variability will allow us to accelerate the yield learning curve.

In the case of Qualcomm, being the first on a new process node means a long process ramp with the foundry. Others that follow at the same foundry will receive a partial “free ride.” However, design-specific improvements using DFM techniques will not provide the same free ride to followers.