Embedded computing modules (ECMs), which employ silicon circuit board (SiCB) technology, are beginning to compete with FR-4 as the solution of choice in many design cases. The reasons for this include improved packing density, improved speed, and reduced power. This paper addresses some of the improvements in power consumption available to users of ECMs.
December 31, 2009 – SiCB technology allows bare die FPGAs, CPUs, and memory to be placed together on a single silicon substrate. The SiCB can also include various associated support parts such as oscillators, bypass caps, calibration resistors, etc. Once the components are merged together on the silicon substrate, the SiCB is typically attached to a rugged substrate (e.g. ceramic or FR-4) using BGA technology. The result is an ECM that can be placed on the customer’s FR-4 board, again using BGA technology.
The power savings within an ECM come from a reduction in I/O power. In a typical system as described below, an immediate reduction of 7.4W, or 22% in overall power, can be achieved on an ECM as compared with an equivalent FR-4 based system. There are two primary sources for this reduction: (1) Reduced I/O parasitics due to the use of bare die and closer distances between die, and (2) Elimination of termination resistors or on-die termination due to the improved I/O parasitics and the lossy characteristics of SiCB wiring.
This paper will explain how reduced I/O parasitics and removal of termination resistors can lead to power reductions of 22% in a typical system. These results can be easily adapted to other system configurations to find the power reduction potential in a variety of systems. As ECMs become more popular, additional savings can be achieved through optimization of the FPGA I/O cells for the SiCB environment.
Example system configuration
A typical system that can take advantage of SiCB technology is shown in Figure 1 below. This system consists of an FPGA with 8 channels of DDR3 SDRAM memory. Each memory channel consists of 2 1Gb ×16 DDR3 SDRAMs with shared command/address bus and a ×32 data channel. The DDR3 interface operates at 533MHz (1066 Mbps). For the SiCB-based implementation, all of the memory interface signals are able to be contained within the ECM. Only FPGA I/Os that are used for other purposes are required to be delivered out of the ECM.
Figure 1: Typical system implemented on conventional FR-4 and new SiCB technology. |
A key component of finding the power savings is to determine the number of I/Os required for implementing the 8 channel DDR3 memory system. Each channel consists of a single DDR3 clock and command/address bus. For DDR3, there are 2 clocks (CK/CK#) and 22 command/address signals (13 address, 3 bank address, 3 command, 1 chip select, 1 clock enable, and 1 odt control pin). The data interface for a single channel consists of 4 bytes of data. Each byte consists of 8 data signals, 2 data strobe signals (differential, bi-directional), and 1 data mask for a total of 11 data signals per byte. Therefore, the full channel consists of 4×11=44 data signals. Note that for a Read operation, the data mask is not used, so the full data channel for a Read consists of 4×10=40 data signals.
Table 1 summarizes the number of I/Os related to the memory interface:
Table 1: Number of DDR3 memory interface signals. |
Power savings through signal wiring parasitics
It is visually apparent from Figure 1 that the overall dimensions and the interconnect lengths between FPGA and memory are significantly reduced for the SiCB-based implementation. In addition, the removal of the FPGA and memory packages removes significant package parasitics from the signal path. These reductions contribute to reduced power through reduced parasitic capacitance, while providing a cleaner signaling environment as well.
Signal trace parasitics
For the FR-4 implementation, the signal routing length is estimated to be around 58mm based on recent siXis designs. This includes about 45mm of 50Ω trace routing and about 13mm of routing within the FPGA and memory packages. The 45mm trace is assumed to be a typical microstrip construction with 170μm width, 17 μm thickness, and 100μm dielectric thickness. The relative permittivity of the dielectric is assumed to be 4.5. With these parameters, the capacitance and inductance of the 45mm line are about 5.4pF and 13.7nH, respectively.
For the SiCB-based implementation, the signal routing length is reduced by about 60%, from 58mm to about 23mm worse case. For the silicon process, the 23mm length corresponds to a capacitance of about 3.7pF, an inductance of about 8.4nH, and a resistance of about 23Ω at DDR3 frequencies. Higher ratios of resistance to inductance are also possible, depending on the particular requirements of the signaling channel. Thus, the signal channel behaves like a lossy transmission line where reflections are significantly dampened by the resistance of the line.
Package parasitics
The packaged FPGA and DRAM contain additional capacitance and inductance that can be eliminated when moving from an FR-4 implementation to an SiCB-based implementation. The removal of the package inductance improves signaling, while the removal of the capacitance reduces the power required to switch the signals. Packaged and bare die parasitic capacitance for the DDR3 memory and FPGA have been estimated based on a review of IBIS models from various suppliers. For the DDR3 memory, the review showed that the ratio of die and package capacitance to total capacitance was consistent among various memory vendors. Based on this conclusion, the die capacitance for the DRAM is estimated as a percentage of the common datasheet values. The package parasitics of the FPGA were estimated using the Altera Stratix family of FPGAs, including IBIS models for the Stratix III and datasheets for both the Stratix III and Stratix IV. The Stratix IV datasheet lists an input capacitance of 8pF. Using the breakdown of package vs. die capacitance from the IBIS model, an estimate for the Stratix IV package and die capacitance was obtained. The results for packaged and bare die DDR3 and FPGA capacitance are shown in Table 2 below. The C_Total value represents the total capacitance for the packaged FPGA or memory. The C_Die value is the left over capacitance applicable when using bare die. For reference, the average package inductance is shown as well. The FPGA and DRAM inductance for an SiCB-based implementation is reduced to zero.
Table 2: Typical FPGA and DDR3 memory die and package parasitics. |
Power savings due to reduced parasitics
The power savings due to reduced signal capacitance can now be calculated using the estimated signal capacitance for FR-4 and SiCB-based implementations and the number of signals required to support the memory interface. This estimation is done using a standard formula for the power required by a toggling signal to charge its capacitance.
where:
n = number of signals switching
T = Average toggle rate; number of low to high transitions per clock cycle
C = Total capacitance of the signal
f = Clock frequency (T*f = frequency of the toggling signal)
V = Voltage swing of the toggling signal
Table 3 shows the tabulation of each term and the final power consumed by the memory interface for both FR-4 and SiCB-based implementations. Note that the voltage swing has been derived from the termination settings, which are detailed in the next section. Due to the lossy nature of the SiCB signal channel, the voltage swing is similar and taken to be the same for both implementations.
The following switching assumptions are used for the switching power calculation:
- Memory interface is running continuous read/write operations with interleaved banks
- Clocks run full-time (100% toggle rate)
- New command/address presented at each clock cycle (50% toggle rate)
- Data toggles at the maximum rate (100% of clock toggle rate for DDR)
- Access is equally split between Read and Write operations (50% toggle rate each)
Table 3: Power consumption due to signal capacitance. [1] The ODT pin is unused in the SiCB-based implementation because the on-die termination is disabled. |
Power savings through termination resistors
As discussed above, the signal channel on an SiCB tends to be lossy compared to a typical FR-4 implementation. For the signal distances used on SiCBs, this lossy nature can be turned into an advantage because the signals are inherently damped and do not require termination resistors to provide excellent signal quality. Because SiCB wires are much shorter than FR-4 wires, the reduced signal propagation speed due to R*C delay is more than compensated for by the shorter length.
This section will analyze the power consumed by a typical FR-4 termination scheme for DDR3 as shown in Figure 2. This scheme is consistent with recommendations of FPGA vendors when using discrete DDR3 components. Once the power required for the FR-4 termination scheme is calculated, this power can be translated into savings for the SiCB-based implementation, which requires no termination.
Figure 2: Typical DDR3 termination scheme for FR-4 implementation. |
The power required for the termination scheme can be estimated by simply looking at the DC current consumed when the signal is driven to a high or a low state. The power related to charging the capacitance of the signal line has already been considered in the previous section.
To analyze the power for driving a high or low in each termination case, one needs to solve the simple resistance network associated with each termination type. Since the FR-4 implementation uses nearly lossless transmission lines, the transmission line can be considered as a short circuit for the DC analysis. For each analysis, the output driver input is shorted to 0V or VDD and the resulting output voltage (VOUT) and the DC power can be calculated. These results are shown in Table 4 below as PFPGA(High/Low) and PMEM(High/Low). Note that the power has been calculated separately for the FPGA and the memory. This was done to identify which portions of the termination current flow from the FPGA power supply and which portions flow from the memory power supply. To find the total power delivered from the FPGA and the memory, the single signal results are scaled according to the number of signals, the percentage of time that the signal drives high/low, and the percentage of time that the data is driven for Write vs. Read commands. This final calculation is summarized by the following equations.
where:
DT = Percentage of time that the signal is driven (not tri-stated)
Total signals = Total number of signals of the particular type
HT = Percentage of time that the signal is driven high; (1-HT) = Percentage of low time
PFPGA(High/Low) = DC power consumed by the FPGA for terminating a single high/low signal
PMEM(High/Low) = DC power consumed by the memory for terminating a single high/low signal
Table 4: Output voltage and DC power dissipation for typical DDR3 termination scheme. [2] For the termination analysis, the differential clock is treated as one pair rather than two independent signals. |
Summary of FR-4 vs. ECM power dissipation
Typical FPGA and memory core power
Having calculated the I/O power dissipation for switching the signal line and for termination, the final step before identifying the power savings is to estimate the core power required for the DDR3 memory and for the FPGA. The ×16 DDR3 SDRAM is assumed to operate at 533MHz (1066Mbps), which corresponds to a speed bin of DDR3-1066. The operating mode is assumed to be continuous interleaved burst mode (IDD7). A review of various memory vendors showed that the current specifications vary considerably from about 200mA to about 380mA. Therefore, the average of four leading memory vendors was taken, which corresponds to about 265mA per device. At 1.5V, this represents about 400mW per device or about 6.4W for the full memory system.
The core power for the FPGA is estimated based on a heavily utilized Altera Stratix IV configuration. The estimates were obtained from the power calculation software provided by Altera. Note that this calculation did not include termination power or power required to switch the output signals. These components of the power consumption have been calculated in previous sections. Table 5 shows the breakdown of the Core power for the FPGA.
Table 5: FPGA core power for a typical configuration. |
Total power consumption and savings through SiCB technology
With all power contributions now accounted for, the total power consumption for the typical system can be summarized as shown in Table 6 below. As the results show, a typical system such as the one used for this analysis can benefit from a power savings of 7.4W when using embedded computing modules (ECMs) containing SiCB technology. For the example FPGA configuration, which represents a heavily utilized Stratix IV, this translates to a reduction in power of 22%. For systems with lower FPGA resource utilization and therefore less overall power consumption, the relative savings offered by SiCB technology would of course increase. Additionally, as ECMs become more popular, additional savings can be achieved through optimization of the FPGA I/O cells for the SiCB environment.
Table 6: Power dissipation for FR-4 vs. ECM implementation of example system. |
Biography
David Blaker is VP of engineering and manufacturing at siXis Inc., 3021 E. Cornwallis Road, Research Triangle Park, NC USA; 919-248-9193, e-mail: [email protected].
References
(1) IBIS models and datasheets from various 1Gb ×16 DDR3 SDRAM memory vendors: [a] Micron: MT41J64M16LA; [b] Samsung: K4B1G1646E; [c] Hynix: H5TQ1G63AFP; [d] Elpida: EDJ1116BBSE
(2) IBIS models and datasheets from FPGA vendors: [a] Altera Stratix III (datasheet and IBIS model); [b] Altera Stratix IV (datasheet only, IBIS model not available)
(3) Altera AN-520-1.1, "DDR3 SDRAM Interface Termination and Layout Guidelines," May 2009.
(4) Altera PowerPlay Early Power Estimator (v9.0.1 B4)