Who Will Be Responsible When a Stacked Die Fails? So Far, The Answer Isn’t Clear
The shift to stacking die raises some interesting technology challenges — how to model through-silicon vias (TSV), how to deal with physical effects such as stress, heat, and noise, and how to deal with electrical issues such as electrostatic discharge. But the more immediate hurdle to jump-starting this market appears to be on the business side.
From a technology standpoint, a 2.5D stack initially can be built using the same formula and methodologies as PCBs, multi-chip modules and systems in package, which are hardly new concepts. In fact, they are well tested and have been widely deployed in the past. Building a 3D stack (figure) with TSVs will be tougher because it relies on new technology that has not been well tested and proven. But in both cases, there will be an additional challenge: turning this into a commercially viable model that allows massive re-use, particularly of analog from previous process generations, as well as merchant IP.
The IP in these stacked die may include everything from a subsystem to a full chip, and it may include internally developed die, subsystems, and blocks as well as commercially developed versions. That raises some interesting challenges, though, because what constitutes a good die in a planar configuration may not work as well — or sometimes not at all — in a stacked configuration.
The big question is: Who is responsible for that failure? Is it the maker of the subsystem or die? And if so, which one? Is it the maker of the die that doesn’t work in a particular stacked configuration or the maker of the die that caused the other die to fail? In some cases, two known good die may both fail when stacked together. And does the liability rest with the company putting those die together because its engineers should have characterized the physical effects of the die that were being stacked?
This becomes even murkier as design meets manufacturing. Die need to be thinner in stacked configurations, which means the physical effects are more pronounced. Noise from an I/O interface, for example, may have no effect at all on a single die, but it could disrupt a sensitive analog signal in a stacked arrangement. In addition, there are new technologies that need to be accounted for, most notably TSV, which don’t necessarily expand and contract at the same rate as silicon.
“At advanced nodes, the die is thin so the TSVs through the die can add to the complexity of stress,” said Amit Marathe, manager of reliability and modeling at GlobalFoundries. “Initially there will have to be restrictive design rules to make sure that stress does not interfere with device performance. You need sufficiently long distances between the devices on a chip.”
Getting this formula right is critical for the semiconductor industry’s continued progress, especially since multiple companies need to share information. That kind of information has been considered proprietary in the past, and inside of many companies it is still considered a competitive edge. Still, the business case for cooperation is high. In addition to NRE costs and missed market windows, stacked die also scale the bill of materials by as many times as there are layers.
“What 3D stacking adds is more ways of hurting or helping ourselves,” said Drew Wingard, chief technology officer at Sonics. “Once you winnow the choices based on economics, then you have to figure out what are the high-level user benefits. You may get more features, save power or energy, and optimize on cost. You can see that in Apple’s approach to SoCs. Some of their products have been done with components that are behind the competition, but their focus on the user is so strong that they always hit it right.”
3D also provides the opportunity to improve performance and energy efficiency by widening the channels and shortening the distance that signals need to travel. In fact, it was the possibility for enormous improvements in performance that first led to research on stacking of die at companies such as IBM in the early part of the millennium. At the time, being able to save power was considered a secondary concern. Since then, with a focus on more mobile devices, the priorities have been reversed, with re-use of analog and flexibility in what gets stacked on something else now a close second.
“This opportunity is massively wide and very appealing to a lot of customers,” said Simon Segars, executive vice president and general manager of ARM’s Physical IP Division. “But it also has a bearing on how you develop IP, which will have thousands of connectors up to memory. We have R&D work going on with IP, bandwidth issues, and the physics of how to put this all together.”
And there are standards organizations — most notably Si2 — that are working to develop standards for how the technology goes together. But to really get stacking going, one of the big challenges is less about the technology and more about the ability to share responsibility — and risk — to make the stacked die work as planned.
– Ed Sperling