These other root causes, which can generally be categorized into over-voltage, over-current or over-power, are in fact more prevalent causes of failure than ESD by a wide margin. This is due in large part to the lack of coherent design and mitigation strategies. One of the main reasons for this is that EOS root causes are widely varied and very application dependent. As a result, no simple broad models for these other root causes have emerged comparable to Human-Body Model (HBM) and Charged-Device Model (CDM) for ESD. Common device design practices have not been developed to the same extent, system level approaches tend to be ad hoc and responsibility for controlling potential sources in manufacturing tends to be diffused or non-existent.
So the electronics industry has continued to be faced with a major portion of device failures without a way of addressing them in a concerted fashion. This has been true for decades. The Pareto chart in Figure 1 is typical, although some organizations include more detail. The EOS or ESD assignments are mostly made from initial failure analysis reports since rigorous root cause analysis is seldom done.
The breadth of possible root causes for EOS was nicely summarized by Kashani and Gaertner in their 2011 paper [2]. Around the same time attempts to organize and characterize phenomena which cause EOS were beginning, especially in the automotive industry. Many in the field were calling for standards organizations to establish EOS standards and methods analogous to what had been successful in tackling ESD. Of course such standards have not been forthcoming and this is no surprise. For example, establishing standards for device-level EOS testing demands taking into account many different situations and possibilities for the stresses involved. Agreeing on one or two standards as in ESD would be a daunting, if not impossible, task.
Nonetheless some industry groups began forming working groups and technical committees to look for ways to make progress. An ad hoc Working Group was convened in the ESD Association (ESDA) in 2011 to begin to bring some order to the chaos. This was a precursor to work undertaken in the Industry Council on ESD Targets. These efforts will culminate in the release of a white paper on EOS in 2016 [3].
Why would an ESD-focused group like the Industry Council issue a white paper on EOS? The connection to ESD that inspired the two-year effort was a misconception prevailing in the electronics industry that low ESD robustness of devices is one of the primary root causes of EOS damage. However, as it turns out, the document evolved into a major comprehensive review of work on EOS. There will be more about that later in this article.
What is EOS?
When these various groups began to meet a serious problem emerged. There was a wide disparity in the understanding of what EOS meant. It turned out that major segments of the industry were using the term in different ways and this had a direct impact on how organizations attacked the problem. Here are some assumptions and important points about the term EOS:
Many engineers are accustomed to seeing the designations EOS or EOS/ESD as the “cause of failure” in physical failure analysis reports. This leaves the impression that ESD and EOS are alternate things of the same kind.
As a result, many view EOS as a type (or collection of types) of stresses just as in the case of ESD. However, an ESD is an event independent of whether there is a “victim” or failed device at all. Whenever there is a sudden transfer of charge between two objects at different potentials (definition of ESD), there is an ESD event.
An overstress is clearly something qualitatively different from ESD. The only way there can be an overstress is if there is some information about how much stress a victim device can be expected to withstand. Using this point of view, an electrical stress (i.e., applied voltage or current – intentional or not) only becomes an overstress if it exceeds some device limit that is usually included in the device data sheet. That is, we only know if we have an EOS if we know that the stress exceeded a device specification. This means that we also need a consistent way of communicating and defining specifications. This is done in terms of documented limits such as the absolute maximum rating (AMR) found in device data sheets. The EOS White Paper discusses the conceptual link between EOS and AMR.
Many of us first learned of electrical overstress from some form of the Wunsch-Bell curves for power-to-failure based on some specific geometries and mathematical models for thermal failure of devices. An example is given in Figure 2.
These plots are instructive in that they convey a concept of failure depending on the duration and magnitude of pulses which is of course physically reasonable. The pulse duration influences the amount of heat that can flow away from the failure site and solutions to the heat equation result in the different slopes in the plot. However, the typical presentation of these plots conveyed some assumptions that many of us have had to unlearn, such as that all ESD and other possible root causes happen according to the same simple mechanism.
For example, a typical plot does not include the effect of pulse rise time which is an important factor in determining where and how a device might fail. It is only a short logical jump from this single-mechanism view to believing (incorrectly) that one protection strategy will apply to all or most root causes and therefore that better ESD protection will better protect devices from other EOS root causes. This is not true.
The relationship and contrast between the terms EOS and ESD are represented pictorially in Figure 3.
Practical Definitions for EOS and AMR
As mentioned earlier, working groups attempting to produce a common view of EOS had considerable difficulty in reaching agreement. People working in failure analysis for example tend to categorize device damage according to the physical characteristics of the damage site while those working in device characterization are more focused on the limits of device performance and the consequences of exceeding those limits. A large amount of time in the early EOS strategy meetings was spent trying to reconcile tightly held views about EOS and related terminology. This time was spent because a common practical approach was seen as essential for further discussions and crucial for communication between suppliers and customers.
Prior experience had shown that misunderstanding EOS can lead to wasting resources in search of root causes in the wrong direction and in protection design changes that do not improve quality or reliability. A common understanding of EOS allows device manufacturers to provide clear maximum electrical limits. When these limits are clearly communicated, system manufacturers can incorporate devices into their systems while providing an environment in which the devices can safely operate. Considerations of types of stresses (DC/AC), duration of stresses
and latent effects were among the issues discussed before arriving at a proposed common set of terms and definitions.
The EOS White Paper also calls for more precise use of terms. Differentiation is thus made among an EOS event, EOS damage, and an EOS root cause. An “EOS event” is notable when it results in damage in system operation, particularly if the device is permanently damaged. This is called a failure related to “EOS damage.” Finally, an “ESD root cause” is that action or set of actions that created the situation that caused the damage.
The wide variety of root causes is summarized in Figure 4. These are the root causes which must be addressed to decrease the incidence of EOS damage and device failures.
The following definition of EOS was adopted and used as the basis of all discussion in the EOS White Paper:
“An electrical device suffers an electrical overstress event when a maximum limit for either the voltage across, the current through, or power dissipated in the device is exceeded and causes immediate damage or malfunction, or latent damage resulting in an unpredictable reduction of its lifetime.”
This definition is strongly coupled to what is meant by a “maximum limit.” The EOS White Paper presents a practical interpretation of EOS in terms of maximum operating conditions and AMR. A generalized view of AMR is presented since some common sources (e.g., JEDEC) only define AMR in terms of voltage. In general, an AMR is understood to represent the point beyond which a device may be damaged by a particular stress. Each possible stress has its own AMR. The AMR is assigned by and is the sole responsibility of the supplier. It may include considerations of acceptable failures-in-time (FIT), but this linkage is not usually described in a data sheet. The AMR also depends of the level of guard banding and different AMRs maybe cited different stress durations.
The relationship of AMR to other device terms and limits is displayed in Figure 5.
In general, the astute system manufacturer should understand that, while an operating region may exist between the specified maximum operating condition and the AMR values (region B), this region is there to provide a buffer for stress events to avoid system disruption and allow resumption of normal operation after the stress. This region has many restrictions for operation and any attempt to operate in this region must be discussed with and agreed upon with the supplier. Additionally, not every device will fail immediately upon experiencing an event above AMR (Region C in Figure 5).
However, this is still an EOS event and is considered high risk for latent damage and likely future permanent damage. Even in Region D, the probability of immediate damage (blue curve) is not a vertical line, but any unit experiencing an event exceeding AMR will experience latent EOS damage. Finally, a well written AMR will often be specific to the environment in which the device is expected to operate by its manufacturer. It is not only the manufacturer’s definition of the maximum electrical and thermal limits, it also defines the limits of their responsibility when the component is damaged as a result of exceeding those limits.
Alternate for the Term “EOS” in Failure Analysis
The definition for EOS presented here was chosen as the most practical and clear approach for communication between suppliers and users of electronic devices. It is important to note that, in the broader electronics industry, the term EOS will continue be used in other ways, and this must be taken into account especially in communications with failure analysis engineers:
Failure analysis engineers are likely to assign (some would say prematurely) the term EOS to any visible damage that appears to have been the result of excessive voltage or current. These assignments are often based on experience and may often be correct. However, the failure analysis engineer often makes this assignment without knowledge of the maximum limits of operation, nor any information on the real world electrical event, and therefore does not know whether the device experienced EOS, per the chosen definition, or if it was a defective device that failed under a stress within the operating limits.
The failure analysis engineer may argue that any device that is charred, burned or partially vaporized has very likely been overstressed. There will continue to be a large community of engineers who will use the term EOS this way despite attempts here to drive towards a common language. An alternate term for the initial physical failure analysis observation has been proposed. The term electrically induced physical damage (EIPD) is used in the White Paper as the term that should be used by failure analysis engineers when no clear communication has been completed with the customer as to possible root causes of the damage. The definition of EIPD is:
“Damage to an integrated circuit due to electrical/thermal stress beyond the level which the materials could sustain. This can be melting of silicon, fusing of metal interconnects, thermal damage to package material, fusing of bond wires and other damage caused by excess current or voltage.”
EIPD is recommended to be used when it has not yet been determined if a unit experienced an EOS event by the definition above. That determination can only be made after the supplier and customer have worked together to investigate root causes.
More on Confusion between EOS and ESD
As mentioned earlier, ESD is merely one type of electrical stress that can exceed specific capabilities of a device. EOS is a much broader term for results which can result from a multitude of stresses and root causes. It is critical to understand therefore that, if EOS refers to many independent possible root causes, there can be no single protection strategy for EOS damage. In particular, since many device users seem to be confused by this, it must be stated clearly, ESD protection does not provide any predictable protection for EOS root causes other than ESD.
This misconception has been refuted convincingly in JEDEC publications JEP155 [4] and JEP157 [5] where it is shown that the incidence of EOS-induced failures is independent of the level of HBM and CDM robustness. Rather, improvement and mitigation of EOS failure causes will only advance through better communication between the supplier and the customer. This includes proper understanding of AMR, realistic specifications for it, finding the root cause of EOS damage incidents, and identifying the field and system application issues.
EOS in Manufacturing
In addition to this comprehensive effort on EOS by the Industry Council, the ESDA has convened a Working Group (WG23) collecting and developing best practices for the mitigation of EOS root causes in manufacturing. This can be thought of as an effort to elevate EOS root cause mitigation to the level currently in place for ESD (e.g. S20.20 [6]). While there is a long way to go before there is EOS-equivalent of S20.20, the goal is to increase the incidence of EOS-based audits and measurements in manufacturing and commensurate decrease in EOS-induced damage and failures. WG23 hopes to release its first document in 2016.
Conclusion
EOS has long been a major cause yield loss and field failures in the electronics industry. However, concerted efforts to reduce this large class of device failures have been rare and ineffective. Initiatives in the ESDA and a concerted effort by the Industry Council on ESD Targets has led to a soon to be published white paper on EOS with the view of providing a step function improvement in EOS mitigation. It was soon realized in this effort that even the basic terms and definitions about electrical failure of devices needed to be revisited and aligned. This article has focused on the results of this revisiting of fundamental concepts and reviewed and explained the new terms and definitions being proposed by the Council. These changes may require a major shift in thinking in some segments of the industry. Efforts are also continuing in the ESDA WG23 on EOS-mitigation in manufacturing.
Notes and References
- This does not mean that ESD is not a continuing and future concern. The technology roadmap for future device ESD sensitivities indicated that ESD protection design and improved factory controls will continue to be needed as much as ever.
- “The Impact of Electrical Overstress on the Design, Handling, and Application of Integrated Circuits,” K.T. Kaschani and R. Gaertner, EOS/ESD Symposium Proc. EOS-33 (2011)
- White Paper 4: Understanding Electrical Overstress – EOS, Industry Council on ESD Targets (to be published in 2016)
- White Paper 1: A Case for Lowering Component Level HBM/MM ESD Specifications and Requirements (2010). Available from the ESDA website at www.esda.org/whitepapers, or at
www.jedec.org as JEDEC Publication JEP155. - White Paper 2: A Case for Lowering Component Level CDM ESD Specifications and Requirements (2010). Available from the ESDA website at
www.esda.org/whitepapers, or at www.jedec.org
as JEDEC Publication JEP 157. - ANSI/ESD S20.20-2014 – ESD Association Standard for the Development of an Electrostatic Discharge Control Program for Protection of Electrical and Electronic Parts, Assemblies and Equipment (Excluding Electrically Initiated Explosive Devices)
Dr. Welsher has served as Chairman of the ESDA Standards Committee and Technical Program and General Chair of the EOS/ESD Symposium. He is currently a member of the ESDA Boards of Directors, and just completed a term as the President. He has also been active in the JEDEC Quality and Reliability Committee and Board of Directors. He is a member of the Industry Council Core Team and is Chair of ESDA WG23 (EOS Best Practices in Manufacturing).
Dr. Welsher holds a B.S. in Chemistry from Florida State University and a Ph.D. in Chemical Physics from the University of Texas at Austin. He can be reached at terry@dangelmayer.com.
About the EOS/ESD Association is the largest industry group dedicated to advancing the theory and the practice of ESD avoidance, with more than 2000 members worldwide. Readers can learn more about the Association and its work at www.esda.org.