# Optimization of Reliability and Power Consumption in Systems on a Chip

T. Simunic<sup>1</sup>, K. Mihic<sup>2</sup>, G. De Micheli<sup>2</sup>

| <sup>1</sup> CSE Department, UCSD, | <sup>2</sup> CSL, Stanford U., |  |  |
|------------------------------------|--------------------------------|--|--|
| 9500 Gilman Drive                  | 353 Serra Mall                 |  |  |
| La Jolla, CA 92093                 | Stanford, CA 94305             |  |  |
| tajana@ucsd.edu                    | kmihic@stanford.edu            |  |  |

Abstract. Aggressive transistor scaling, decreased voltage margins and increased processor power and temperature, have made reliability assessment a much more significant issue in design. Although reliability of devices and interconnect has been broadly studied, here we characterize reliability at the system level. Thus we consider component-based System on Chip designs. Reliability is strongly affected by system temperature, which is in turn driven by power consumption. Thus, component reliability and their power management should be addressed jointly. We present here a joint reliability and power management optimization problem whose solution is an optimal management policy. When careful joint policy optimization is performed, we obtain a significant improvement in energy consumption (40%) in tandem with meeting reliability constraint for all operating temperatures.

## 1 Introduction

Advances in technology lead to higher device density and operating frequency, and consequently to higher power dissipation and operating temperatures. To deal with such problems, *dynamic power management* (DPM) has been applied in various forms, to both single and networked on-chip components [3],[5]. Reducing energy consumption to the required levels ensures correct and useful operation of the integrated systems. DPM also affects the reliability of the system components. Curbing power dissipation helps lowering the device temperatures and reducing the effect of temperature-driven failure mechanisms, thus making components more reliable. On the other hand, aggressive power management policies can decrease the overall component reliability because of the degradation effect that temperature cycles have on modern IC materials [9],[12]. As a result, there is a need to evaluate the *System on Chip* (SoC) reliability along with power consumption and performance. There are several interesting problems that can be considered.

The first problem is to determine whether or not, for a given system topology, DPM affects reliability and to find if such effect is beneficial or not. The second problem is to include reliability as an objective or constraint in the policy optimization. The third problem is the combined search for system topologies and joint DPM policies to achieve reliable low-energy design. All problems involve both run-time strategies as well as design issues. In this paper we focus on the first two problems. The first one enables us to understand the relationship between run-time power management and reliability analysis. We evaluate reliability, performance and power consumption of computational elements (cores) in SoCs by modeling system-level reliability as a function of failure rates, system configuration and management policies. The overall objective is to introduce design constraints, such as *mean time to failure* (MTTF), in the design space spanned by performance and energy consumption. The major novelty and contribution of this paper is the definition of a joint dynamic power management (DPM) and reliability (DRM) optimization method, that yields optimal system-level run-time policies. In addition, we evaluate the effect of the policy on single core and multi-core systems. Experimental results show that with careful joint optimization we can save energy by 40% while meeting both reliability and performance constraints.

The rest of the paper begins with a discussion of related work. Our approach for assessing and optimizing reliability and power is presented in Sections 3 and 4. Optimization results for a typical SoC design are presented in Section 5.

#### 2 Related Work

Integrated systems have been in production for a while in the form of Systems on Chips (SoCs). A number of issues related to SoC design have been discussed to date ranging from managing power consumption, to addressing problems with interconnect design. Previous work for energy management of networked SoCs mainly focused on controlling the power consumption of interconnects, while neglecting managing power of the cores. A stochastic optimization methodology for core-level dynamic voltage and power management of multi-processor SoCs with using a closed-loop control model has been presented in [3].

Reliability of SoCs is another area of increasing concern. A good summary of research contributions that combine performance and reliability measures is given in [1]. An approach to improve system reliability and increase processor lifetime by implementing redundancy at the architecture level is discussed in [2]. A number of fault-tolerant micro-architectures have been proposed that can handle hard failures at performance cost [6]. Minimizing energy and performance by exploiting architecture and application-level adaptability has been presented in [8]. The RAMP methodology models chip MTTF as a function of the failure rates of individual structures on chip due to different failure mechanisms [7]. Soft (or transient) failure mechanisms and their effect on power consumption have been studied by a number of researchers (e.g. [22], [23]). Incorrect signal levels due to cross talk is an example of a soft failure. In this work we address hard failure mechanisms which cause irrecoverable component failures. Open interconnect line due to electromigration is an example of a hard failure. An overview of most commonly observed hard failure mechanisms that affect the current semiconductor technologies is given in [11]. The effect of a temperature gradient on the electromigration failure mechanism is investigated in [12].

The description of the connection between fast thermal cycling and thin film cracking (interlayer dielectric, interconnections) is presented in [13] and a model is given in [32]. A model for Time-Dependent Dielectric Breakdown is developed in [14]. Our work presents the first unified methodology for optimization of reliability, power and performance in SoCs.

## **3** Reliability Modeling

Our objective is to optimize system-level power consumption under reliability and performance constraints. The *reliability* of a component (or system) is a probability function R(t), defined on the interval  $[0,\infty]$ , that the component (system) operates correctly with no repair up to time *t*. The *failure rate* of a component (or system) is the conditional probability that the component (or system) fails in the interval  $[t, t + \Delta t]$  while assuming correct operation up to time *t*. The mean time to failure (MTTF) is the expected time at which a component fails, i.e.  $MTTF = \int R(t)dt$ . In the particular case that the failure rate  $\lambda_f$  is constant with time, then MTTFF is  $1/\lambda_f$  and the reliability is  $R(t) = e^{-\lambda_f t}$ 

In general, failure rates depend on time because of material aging and on temperature because of thermodynamic issues. In this work we focus on the reliability of components during their useful life and thus we neglect aging but we do consider temperature dependence. We assume that components can be in different operational states (e.g., *active, idle, sleep*) characterized by parameters such as voltage and frequency, which determine the component temperature. Thus failure rates can be considered constant within any given operational state. We consider three failure mechanisms most commonly used by semiconductor industry: *Electromigration* (EM), *Time Dependant Dielectric Breakdown* (TDDB) and *Thermal Cycles* (TC).

**Electromigration** is a result of momentum transfer from electrons to the ions which make interconnect lattice. It leads to opening of metal lines/contacts, shortening between adjacent metal lines, shortening between metal levels, increased resistance of metal lines/contacts or junction shortening. The MTTF due EM process is commonly described by Black's model:

$$MTTF_{EM} = A_o (J - J_{crit})^{-n} e^{\frac{Ea}{kT}}$$
(1)

where  $A_o$  is an empirically determined constant, J is the current density in the interconnect,  $J_{crit}$  is the threshold current density and k is the Boltzmann's constant,  $8.62*10^{-5}$ . For aluminum alloys  $E_a$  and n are 0.7 and 2 respectively. The value of MTTF for EM can also be obtained by silicon measurements for the cores. We model the EM failure rate for idle and active states only, because leakage current present in the sleep state is not of large enough to cause the migration:

$$\lambda_{core,s}^{EM} = A_o' (J_s - J_{crit})^n e^{\frac{-Ea}{kT_s}}; \forall s = active, idle$$
(2)

Time Dependent Dielectric Breakdown is a wear out mechanism of dielectric due electric field and temperature. The mechanism causes the formation of conduc-

tive paths through dielectrics shortening the anode and cathode. In this work we use the field-driven model:

$$MTTF_{TDDB} = A_o e^{-\gamma E_{ox}} e^{\frac{Ea}{kT}}$$
(3)

where  $A_o$  is an empirically determined constant,  $\gamma$  is the field acceleration parameter and  $E_{ox}$  is the electric field across the dielectric. The activation energy,  $E_a$ , for intrinsic failures in SiO<sub>2</sub> is found to be 0.6-0.9 and for extrinsic failures about 0.3 [11]. The failure rate due to TDDM mechanism can be defined as follows:

$$\mathcal{A}_{core,s}^{TDDB} = A_{o}' e^{j \mathcal{E}_{ox,s}} e^{\frac{-Ea}{kT_{s}}} ; \forall s = active, idle, sleep$$
(4)

Temperature cycling induces plastic deformations of materials that accumulate every time the cycle is experienced. This eventually leads to creation of cracks, fractures, short circuits and other failures of metal films and interlayer dielectrics as well as fatigue at the package and die interface. The effect of low-frequency thermal cycles, such as turning a device on/off during normal operation, has been well studied by the packaging community [11]. Thermal cycles that occur with higher frequencies and on chip, instead of just at the interface with package, are gaining in importance as power management gets more aggressive, the features sizes get smaller and low-k dielectric becomes more prevalent in the fabrication process [9]. Recent work [12] showed that such cycles play the major role in cracking of thin film metallic interconnects and dielectrics. Expected number of thermal cycles before core failure is given in Equation below. It does not only depend on the temperature range between power states  $(T_{max}-T_{min})$  but is also strongly influenced by the average temperature in the sleep state,  $T_{avg,s}$  and the molding temperature of the package process,  $T_{mold}$ . The exponent q ranges from 6-9, and  $C_{1,2}$  are fitting constants defined in [12] for on chip structures. Mechanical properties of the interlayer dielectric layers are very dependant on the nature of the processing steps. As a result, when  $T_{avg,s}$  increases, the stress buildup on the silicon due the package decreases resulting in a longer lifetime.

$$N_{f} = C_{o} \left[ C_{1} (T_{\max} - T_{\min}) - C_{2} (T_{avg} - T_{mold}) \right]^{-q}$$
(5)  
$$\lambda_{cores}^{TC} = C_{o} \left[ (T_{active} - T_{s}) - (T_{avg,s} - T_{mold}) \right]^{-q} t^{-1} \quad \forall s = sleep$$
(6)

Since a component fails when at least one of the failure mechanisms occurs, we express the overall component failure rate as a sum of failure rates due to all three mechanisms.

#### System reliability

Systems are interconnections of components. We use the term *core* to refer to one of the SoC components that perform computing, storage or communication function. From a reliability analysis standpoint, components can be viewed as in *series (parallel)* if the overall correct operation hinges upon the *conjunction (disjunction)* of the correct operation of components. For example, a system consisting of a processor, a bus and a memory is seen as the series interconnection of three components. A system is therefore characterized by its topology, i.e., by a *reliability graph* [15]. In this

work we use reducible graphs, as they display the conjunctive and disjunctive relations among components. Thus, system reliability can be computed bottom-up, by considering series/parallel compositions of sub-systems. When failure rates are constant, the failure rate of a series composition is the sum of the failure rates of each component:  $\lambda_{system} = \sum \lambda_{core_i}$ .

Systems with parallel structures offer built-in redundancy. Such systems can either have all components concurrently operating (*active parallel*) or only one component active while the rest are in low power mode (*standby parallel*). Active parallel combination has higher power consumption and lower reliability than standby parallel, but also faster response time to failure of any one component. The combined failure rate of M active components,  $\lambda_{fap}$ , is defined using binomial coefficient,  $C_i^M$ , and active reliability rate,  $\lambda_f$  [15]:  $\lambda_{fap} = \sum_{i=1}^{M} (-1)^{i-1} \frac{C_i^M}{i\lambda_f}$ . Since our goal is to minimize power

consumption while improving system reliability, in this work we focus on standby parallel configurations with only one active component. In this case, the failure rate is:  $\lambda_{fsp} = \lambda_{fs}/M$  [15].



Figure 1 System Model

## 4 Joint Policy Optimization

We can define an optimization problem given a system topology and a set of component operational states characterized by failure rate, power consumption and performance. Cores are modeled with a *power and reliability state machine* (PRSM) as shown in Figure 1, a state diagram relating service levels to the allowable transitions among them. Multiple cores are a reliability network of series and parallel combinations of single core PRSM models.

Single core PRSM characterizes each state by its failure rate,  $\lambda_{core,state}$ , and power consumption,  $P_{state}$ . Thus, active state *i* is characterized by the failure rate  $\lambda_{core,activei}$ , frequency and voltage of operation,  $f_{i}V_{i}$  which is equivalent to the core processing

rate  $\varphi_{fi}$ , and power consumption  $P_{ai}$ . We assume for simplicity that workload and core's data processing times follow exponential distribution with rates  $\varphi_{workload}$  and  $\varphi_{core_fi}$ . More complex distributions can also be used [4],[5]. In idle state a core is active but not currently processing data. Sleep state represents one or more low power states a core can enter. *TransitionToSleep* and *TransitionToActive* states model the time and power consumption required to enter and exit each sleep state. Transition times to/from low-power states follow uniform distribution with average transition times  $t_{ts}$ ,  $t_{ta}$  [10]. The arcs represent transitions between the states with the associated transition times and rates. The transitions can occur due to normal operation of the system, or because of a command (action) is given as a part of the management policy. We define two actions, "go to sleep", which causes a transition to sleep state, and "continue", which allows the system to continue normal operation.

In order to obtain the failure rate for each state we need to evaluate failure rates of each of the three mechanisms described in Section 0 as functions of the component temperature. Expected temperature in a state is estimated using reference active state temperature  $T_{active}$ , the expected time spent in a state *s*, due to an action *a*, y(s,a), and state's steady state temperature  $T_{state,ss}=T_{active}P_{state}/P_{active}$ :

$$T_{state} = (T_{active} - T_{state,ss})e^{-\frac{y(s,a)}{\tau}} + T_{state,ss}$$
(7)

Reference active state temperature, shown in Equation (8), is defined using R<sub>thdie</sub> and R<sub>thpackage</sub> the thermal resistances of die and package, for a reference frequency and voltage of operation in the active state,  $f_0V_0$ . Thermal RC constant,  $\tau \approx c\rho a^2$  has  $c = 10^6 J/m^3 K$  for silicon thermal capacitance,  $\rho = 10^{-2} m K / W$  for thermal resistivity [10] and the wafer thickness *a* of 0.1-0.6mm.

$$T_{active} \propto P_{active} (R_{th \, die} + R_{th \, package})$$
 (8)

The power management policy can either shorten or lengthen the lifetime of a core. Lower power consumption results in lower temperature and thus lower EM and TDDB failure rates. On the other hand, the thermal cycling failure rate rises as the frequency of switching between power states increases [7],[12]. Joint optimization of power, performance and reliability is needed to arrive at a policy that meets all constraints. The formulation of the optimization problem, shown in Equation (9), is based on the Semi-Markov Decision Process model [4],[5].

$$\begin{array}{ll} \min & \sum_{c=1}^{N} \cos t_{energy, c} \\ s.t. & \sum_{a \in A}^{n=1} f(s,a) - \sum_{a \in A, s \in S} m(s'|s,a) f(s',a) = 0; \ \forall s, \forall c_s \\ & \sum_{a \in A, s \in S}^{n=1} \sum_{s \in S} y(s,a) f(s,a) = 1; \quad \forall c_s \\ & \sum_{c=1}^{N} \cos t_{energy, c} < Perf_{const}; \quad \forall c \\ & Tpl\left(\lambda_c\right) \leq \operatorname{Rel}_{const}; \quad \forall c_s \\ & \lambda_c = \sum_{i \in F} \sum_{a \in A, s \in S} \lambda_{core}^i(s,a) y(s,a) f(s,a) \end{array}$$

(9)

This linear program minimizes the cost in energy over all cores,  $cost_{energy,c}$ , under a set of constraints. As such it can be solved with a linear program solver. The unknowns are state-action frequencies f(s,a) which represent the expected number of times that the system is in state *s* when command *a* is issued. The management policy is derived for each core that has a low power state where "go to sleep" command can be given. The policy is in form of a table of probabilities for entry into each low-power state *a*:  $p(s,a) / \sum f(s,a')$ .

The first constraint shown in Equation 9 is a "balance equation" which specifies that for each core *c* the number of entries to any state has to equal the number of exits. Here m(s'|s,a) is the probability of arriving to state *s* given that the action *a* was taken in state *s*. The second constraint specifies that the sum of probabilities over each core states and actions has to equal one. Third constraint specifies that each core's expected performance penalty for transitioning into low power states has to be lower than the specified limit, *Perf<sub>constr,c</sub>*. We next describe the reliability constraint, represented by the last two lines in Equation (9), since definitions of other constraints are in [5].

The reliability constraint, Tpl is a function of the system topology, i.e. Tpl=f(series, parallel combinations). For example, with series combinations  $Tpl = \sum \lambda_{core,s}$ , and with parallel standby  $Tpl=\lambda_{core,standby}/N_{standby}$ . Cleary, a reliability network normally has a number of series and parallel combinations of cores. Each core's failure rate,  $\lambda_c$ , as shown in the last line of the Equation (9), is in turn a sum of failure mechanisms,  $i \in \{EM, TDDB, TC\}$ , when the core is in the state *s* and the action *a* is given. For example, the reliability constraint is given in Equation (10) for a core that has one *active* (A), *idle* (I) and *sleep* (S) state and two actions: go to sleep (S) and *continue* (C). Failure rate in each state,  $\lambda_{core,state}$ , is a sum of failure rates due to failure mechanisms active for that state as described in Section 0.

$$\lambda_{A} y(A, C) f(A, C) + \lambda_{I} y(I, C) f(I, C) + \lambda_{I} y(I, S) f(I, S) + \lambda_{S} y(S, C) f(S, C) \leq \operatorname{Rel}_{const}$$
(10)

We have thus far shown how to perform synthesis of optimal power, reliability and performance policy. We next present the optimization results for SoCs.

| Table 1. SoC Parameters |            |                            |                          |                           |                        |                        |  |  |
|-------------------------|------------|----------------------------|--------------------------|---------------------------|------------------------|------------------------|--|--|
|                         | IP block   | P <sub>active</sub><br>[W] | P <sub>idle</sub><br>[W] | P <sub>sleep</sub><br>[W] | t <sub>ts</sub><br>[s] | t <sub>ta</sub><br>[s] |  |  |
|                         | DSP [17]   | 1.1                        | 0.5                      | 0.01                      | 250u                   | 100n                   |  |  |
|                         | Video [18] | 0.44                       | N/A                      | 0.07                      | 110m                   | 0.9                    |  |  |
|                         | Audio [19] | 0.11                       | 0.03                     | 3e-3                      | 6u                     | 0.13                   |  |  |
|                         | I/O [20]   | 1e-3                       | N/A                      | 6e-6                      | 100n                   | 6u                     |  |  |
|                         | DRAM [21]  | 1.58                       | 0.37                     | 1e-2                      | 16n                    | 16n                    |  |  |



Figure 2 System on a Chip

#### 5 Results

The methodology presented in this work has been tested on an SoC shown in Figure 2. Input to the optimizer are power, reliability and performance characteristics of each core, along with a reliability network topology. The output is a set of management policies obtained from state-action frequencies f(s,a) which are the unknowns in Equation (9). The policies determine when each core can enter any one of its low-power states.

Power and performance characteristics of cores come from the datasheets [17]-[21] and are summarized in Table 1. Each core supports multiple power modes (*active*, *idle*, *sleep* and *off*). Off state is supported by all cores with zero power consumption. Transition times between active and sleep state are defined by  $t_{ts}$  and  $t_{ta}$ . Reliability rates for each failure mechanism (EM, TDDB, TC) are based on actual silicon measurements obtained for 95nm technology. Due to confidentiality reasons we are unable to provide their exact values. Each of the cores in the system is designed to meet MTTF of 10 years. Core's workload and data consumption rates ( $\varphi_{workload}$  and  $\varphi_{core_fi}$ ) are obtained from cycle-accurate simulation of algorithms running on the cores (e.g. MPEG video, MP3 audio). The optimization results have been successfully validated against analytical models [15] for simpler reliability networks. We first present results of single core optimization followed by a discussion on design changes to the core that influence reliability. Then we show system level optimization results for the whole SoC.



Figure 3 Optimization of Single Cores

# Figure 4 Design Case

#### 5.1 Single Core Optimization

We optimize the power consumption of each core presented in Table 1 while keeping the minimum lifetime requirement of 10 years. The objective is to observe how cores based on the same technology of 95nm feature size and comparative dimensions but with different power consumption respond to DPM. Optimization is performed at three internal chip temperature corners (25,50,90°C) in order to set the die operating points close to those defined in datasheets [17]-[21]. The optimization results for maximum power savings achievable at a specified temperature given MTTF constraint of 10 years are shown in Figure . On the lower range of temperatures (25°C-50°C) most of the cores react positively to DPM and allow the maximum power savings to be achieved. Figure shows that maximum power savings decrease for DSP, Video and Audio cores working at 90°C. This decrease is due to thermal cycles failure mechanism. Thus, a system designed to meet a specific MTTF requirement without power management may fail sooner once DPM is introduced. One way to try to address this problem is by redesigning the core.

Influencing the lifetime of power managed core by means of changing the design is a matter of finding the equilibrium between related physical parameters. In Figure 4 we show results of design updates done to the RAM core. EM failure rate is lowered by widening critical metal lines. Core area expanded by 5%, current density dropped by 20% and the core temperature dropped by 2%. Although both EM and TDDB gain from design change, the TC failure rate increase sufficiently to worsen the net reliability by 10%.

100%



80% 60% 40% 20% 0% RAM I/O AUDIOVIDEO DSP

25C

Figure 5 Power Savings for Standby Off

Figure 6 Power Savings for Standby Sleep

#### 5.2 SoC Optimization

Here we examine the influence of redundant components to the overall system reliability. We use the SoC shown in Figure 2 with the core parameters given in Table 1 and the operating characteristics described in the previous section. Since all cores are essential to the correct SoC operation, the initial reliability network is their series combination. Unfortunately, although each core meets MTTF requirement, the overall system does not. Therefore, we add to the SoC redundant components at the cost of increased area. Two redundancy models described in Section 3 are studied: standby sleep, with redundant cores in sleep state until needed, and standby off, with redundant cores turned off. Figures 5 and 6 show that the best power savings are with standby off model. However, this model has the largest wakeup delay for redundant components. The standby sleep model shown in Figure 6 gives more moderate power savings with faster activation time. Results for both models show that not all cores can operate reliably at the highest temperature (e.g. no power savings for AUDIO core at 90°C show that the reliability constraint is not met). Thus, we expand the reliability network to have DSP, AUDIO and I/O with one redundant component in standby sleep and the other in standby off model, while VIDEO and RAM remain with a single redundant component in standby sleep. The new system meets MTTF of ten years at the cost of die area increase while getting power savings of 40% and a faster response time to component failure.

# 6 Conclusion

In this work we show that a functional and highly reliable core may fail to meet the lifetime requirement once power management is enabled due to thermal cycle failure mechanism. As technology scales down, limitations set by thermal cycling are going to be an even more important factor in system design. Thus the methodology we presented in this work for joint optimization of reliability, power consumption and performance is going to be even more crucial. With our optimizer we show that we can obtain large power savings on SoCs while meeting the reliability constraint.

#### References

- M. D. Beaudry. Performance-related reliability measures for computing systems. IEEE Trans. on Comp., c-27:540(6), June 1978.
- [2] P. Shivakumar et al. Exploiting microarchitectural redundancy for defect tolerance. In 21st Intl. Conference on Computer Design, 2003.
- [3] T. Simunic, S. Boyd, "Managing Power Consumption in Networks on Chips," Design, Automation and Test in Europe, pp. 110-116, 2002.
- [4] Q. Qiu, Q. Wu, M. Pedram, "Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service," Design Automation Conference, pp. 701-707, 2001.
- [5] T. Simunic, L. Benini, P. Glynn, G. De Micheli, "Event-driven Power Management," IEEE Transactions on CAD, pp.840-857, July 2001
- [6] E. Rotenberg. Ar/smt: A microarchitectural approach to fault tolerance in microprocessors, Intl. Symp. on Fault Tolerant Comp, 1998.
- [7] J. Srinivasan, S. V. Adve, P.Bose, J. Rivers, C.K. Hu, "RAMP: A Model for Reliability Aware Micro-Processor Design,"IBM Research Report, December 29, 2003
- [8] J. Srinivasan and S. V. Adve. Predictive dynamic thermal management for multimedia applications. In Proc. of the 2003 Intl Conf. on Supercomputing, 2003.
- J. Srinivasan, P. Bose, J.Rivers, "The impact of Technology Scaling on Processor Lifetime Reliability," UIUC CS Technical Report, December 2003
- [10] K. Skadron, T. Abdelzaher, M.R. Stan, "Control-Theoretic Techniques and Thermal-RC Modeling for Accurate Localized Dynamic Thermal ManagementProceedings", Proc. of the 8th Intl. Symp. on High-Performance Computer Architecture (HPCA'02)
- [11] "Semiconductor Device Reliability Failure Models", International Sematech Technology Transfer document 00053955A-XFR, 2000
- [12] H.V. Nguyen, "Multilevel Interconnect Reliability on the effects of electro-thremomechanical stresses", Ph.D. dissertation, Univ. of Twente, Netherland, March 2004.
- [13] M. Huang, Z. Suo, "Thin film cracking and rathcheting caused by temperature cycling", J. Mater. Res. v.15, n.6, pp. 1239 (4), Jun 2000.
- [14] R. Degraeve, J.L. Ogier, et. al, "A New Model for the Field Dependence of Intristic and Extrinsic Time-Dependent Dielectric Breakdown", IEEE Trans. on Elect. Devices, 472(10),v.45,n.2, Feb. 1998
- [15] E.E. Lewis, Introduction to Reliability Engineering, Wiley 1996.
- [16] T. Simunic, K. Mihic, G. De Micheli, "Reliability and Power Management of Integrated Systems," DSD 2004.
- [17] "TMS320C6211, TMS320C6211B, Fixed-Point Digital Signal Processors", Texas Instruments, 2002.
- [18] "SAF7113H datasheet", Philips Semiconductors, March 2004
- [19] "SST-Melody-DAP Audio Processor", Analog Devices, 2002.
- [20] "MSP430x11x2, MSP430x12x2 Mixed Signal Microcontroller", Texas Instruments, August 2004.
- [21] "RDRAM 512Mb", Rambus, July 2003.
- [22] A. Maheshwari, W. Burleson, R. Tessier, "Trading off Reliability and Power-Consumption in Ultra-Low Power Systems", ISQED 2002.
- [23] P. Stanley-Marbell, D. Marculescu, "Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems," ICCAD 2003.