*Editorβs Note**βThe paper on which this article is based was originally presented at the 2017 IEEE Product Safety Engineering Society Symposium, where it received recognition as the Best Symposium Paper. It is reprinted here, with permission, from the proceedings of the 2017 IEEE Product Safety Engineering Society International Symposium on Product Compliance Engineering. Copyright 2017 IEEE.*

Since the introduction of International Organization for Standardization (ISO) standard 26262 [1], which is a functional safety standard for automotive electrical and/or electronic (E/E) systems in 2011, ISO 26262 has effectively mitigated two categories of failure. The first category is the prevention of systematic failures based on safety measures in verification and validation development processes. The second category is the prevention of safety issues that are caused by hardware failure using safety mechanisms. Those safety measures assure logical and quantitative solutions that are not achievable by traditional strategies of quality management (QM).

ISO 26262 defines the probabilistic metric for random hardware failures (PMHF), which is a metric related to the probability of a safety goal violation caused by a random hardware failure, and the architecture metrics that are discussed in papers such as [2] and [3].

Target Values for Hardware

The architecture metrics provide average diagnostic coverages that are relative values [2]. On the other hand, the PMHF is an absolute value of the average failure rate of an item. Although some formulas are shown in Part 10 8.3.3 of [1], a detailed explanation of the PMHF is not provided [4].

Before we start the PMHF discussion, we focus on the definition of the probability of failure (PoF) and related equations shown in [5] as follows:

πππΉ_{ππ‘ππ}_{,}_{π‘ }β‘ Pr{π_{ππ‘ππ} β€ π‘} = πΉ_{ππ‘ππ}(π‘) = 1 β π^{β}^{π}^{ππ‘ππ}^{π‘}, π_{ππ‘ππ}(π‘) = π_{ππ‘ππ}π^{β}^{π}^{ππ‘ππ}^{π‘} (1)

where

- πππΉ
_{ππ‘ππ,π‘}: the probability of failure of an item until time*t*; - πππ‘ππ: the random variable that represents the failure-free operating time of an item, and has an exponential distribution;
- πΉππ‘ππ(π‘): the un-reliability, or the cumulative distribution function (CDF) in terms of the failure of an item until time
*t*; - π
_{ππ‘ππ}: the failure rate of an item; and - π
_{ππ‘ππ}(π‘): the probability density function (PDF) in terms of the failure of an item at time*t*.

The PMHF is related to the probability of a violation of a safety goal due to a random hardware failure within a vehicle lifetime; it is expressed as an average PoF within a vehicle lifetime as (2), using (1) and the Taylor expansion of exponential function with an assumption of π_{ππ‘ππ}π_{πππππ‘πππ} βͺ 1 that is applied throughout the discussion;

Β (2)

where

- π
_{πππ»πΉ}: the PMHF of an item; and - π
_{πππππ‘πππ}: the vehicle lifetime.

The PMHF can eventually be considered as an average failure rate of an item based on (2).

Related Work

Although detailed methods regarding the calculation of a PMHF are not available in the literature, some papers focus on the PMHF metric ([3], [6], and [7]). The authors of [4] derive the PMHF using an expression of PoF based on a probability calculation tool, such as fault tree analysis (FTA). Although each PoF of the basic events can be calculated, we believe that the use of observable parameters, such as failure rates and diagnostic coverages (DCs), which are employed in Part 10 8.3.3 of [1], are more effective for improving the understanding of these parameters and creating an FTA model.

In [4], the authors discriminate between the following two cases with respect to the latent fault of a safety mechanism:

*failures**of**a**safety**mechanism**that**are**βlatentβ**(e.g.,**not detectable); and**failures of a safety mechanism that are detectable but occur within the diagnostic test**interval.*

However, the discussion in [4] is not based on the conditional probability, as shown in [8]. Although the authors introduce the conditional probability in [8], the latent fault calculation has some issues, which are detailed in section V part B. The opposite situation regarding the mission function in the latent fault state is not referenced. This paper aims to clarify that the dual point failure (DPF) should be the failure caused by the second fault when the first fault is the latent fault, where the first element would be a mission function or a safety mechanism.

Target Subsystem

Figure 1 shows the target subsystem named βSUBS.β SUBS includes a mission function βMβ and a safety mechanism βSMβ that supervises M; SM is the primary safety mechanism.^{1} The notations M and SM are from Part 10 8.3.3 of [1]. Two secondary safety mechanisms^{2}Β exist for each M and SM: βSM2Mβ and βSM2SM.β Here, SM and SM2M can be the same element. We can observe three parameters, e.g., failure rate (Ξ»), diagnostic coverage (π·πΆ), and diagnostic period (π), for each element according to Part 10 8.1.7 of [1], as shown in Figure 1. With respect to the element M, let Ξ» be π_{π} and π·πΆ and π be non-existent because the mission function does not have diagnostic capability.

^{1}Primary safety mechanism: safety mechanism to prevent faults from violating a safety goal.

^{2}Secondary safety mechanism: safety mechanism to prevent latent faults.

With respect to the element SM, let Ξ» be π_{ππ} , π·πΆ be πΎ_{π,πΉππΆ,π
πΉ} , and π be zero because SM is working within the fault tolerant time interval (FTTI). In terms of SM2M, let Ξ» be zero (the lemma that proves this follows), π·πΆ be πΎ_{π,πΉππΆ,πππΉ}, and π be π_{π} . For SM2SM, let Ξ» be zero, π·πΆ be πΎ_{ππ,πΉππΆ,πππΉ} , and π be π_{ππ} according to Part 10 8.3.3 of [1].

**Lemma β Secondary Safety Mechanisms Never Fail**

Before we begin to prove that the failure rate of a secondary safety mechanism is zero, we introduce the notation βA β B,β which indicates that element βBβ receives a fault when element βAβ is in a latent fault sate. The small dual-point PoF for SM β M in the period of (t, t + dt] can be defined as follows:

βπππΉ_{ππβπ}_{,}_{π‘} β‘ Pr{SM is in a latent fault state at π‘ β©Β M receives a fault in (π‘, π‘ + ππ‘]}Β (3)

We do not have to consider the fault of SM2M (the secondary safety mechanism for M) in this case because it is considered after time π‘ + ππ‘. When we consider two cases in which SM2SM (the secondary safety mechanism for SM) is in a fault or non-fault state, βπππΉ_{ππβπ,π‘}, as shown in (2), can be expressed as follows:

β πππΉ _{ππβπ}_{,}_{π‘} = Pr{SM2SM is in a fault state at π‘ β©

Β Β SM is in a latent fault state at π‘ β©

Β Β M receives a fault in (π‘, π‘ + ππ‘]} +

Pr{SM2SM is not in a fault state at π‘ β©

Β Β SM is in a latent fault state at π‘ β©

M receives a fault in (π‘, π‘ + ππ‘]} (4)

The first term in (4) assumes that faults occur for SM2SM, SM, and M, which are classified as safe faults according to [1]. As a result, we can calculate the PMHF, assuming that SM2SM is never in a fault state considering only the second term of (4). The same scenario is valid for SM2SM when M β SM; thus, we assume that both secondary safety mechanisms are never in fault states.

Calculation of the PMHF

**PoF Calculation of Single Point Failure**

The PMHF shall be expressed as the sum of a single point failure (SPF) term and DPF terms because the failure that is caused by three or more faults is classified as a safe fault, as previously explained. Because the fault of a safety mechanism does not cause a violation of a safety goal by definition, we obtain πππΉ_{πππΉ} based on the conditional probability as

πππΉ_{πππΉ,π‘ }= Pr{M is in a non β prevention state at π‘}

= Pr{M is in a fault state at π‘

β© the fault is not prevented by SM}

= Pr{the fault is not prevented|M is in a fault state at π‘}

Β Β β Pr{M is in a fault state at π‘}

= (1 β _{πΎπ,πΉππΆ,π
πΉ}) Pr{π_{π} β€ π‘}

= (1 β _{πΎπ,πΉππΆ,π
πΉ})πΉ_{π}(π‘)Β Β (5)

where

- πΎ
_{π,πΉππΆ,π πΉ}: the failure mode coverage of M with respect to residual faults (Part10 8.3.3 of [1]); - π
_{π}: the random variable that represents the failure-free operating time of M; and - πΉ
_{π}(π‘): the CDF in terms of the failure of M.

Therefore, we obtain (6)

π_{πππ»πΉ,πππΉ} = (1 β _{πΎπ,πΉππΆ,π
πΉ})π_{π} = π_{π
πΉΒ } (6)

where

π_{π
πΉ}: the residual failure rate of M.

**PoF Calculation of Dual Point Failure**

*SM*β*M case*

Based on equation (7) in [8], the authors calculate the probability of a DPF caused by βLogicβ when βMonitorβ is in a fault state (πππππ‘ππ β πΏππππ according to our notation).

Pr{subsystem gets a failure during time (π‘, π‘ + ππ‘)} = Pr{Monitor is in a fault at time π‘ β© Logic gets a failure during time (π‘, π‘ + ππ‘)}Β Β (7)

Here, we assume that the Monitor fault does not cause a dependent failure; the event {Monitor is in a fault at time π‘} and the eventΒ {Logic gets a failure during time (π‘, π‘ + ππ‘)} are independent. To use the notation as in the previous chapters, we rewrite Logic as M, Monitor as SM, and Subsystem as SUBS. We rewrite (7) as follows:

Pr{SUBS receives a failure in (π‘, π‘ + ππ‘] } =Β Pr {SM is in a fault state at time π‘} βΒ Pr {M receives a fault in (π‘, π‘ + ππ‘]}Β Β (8)

Then, the left-hand side of (8) can be rewritten as

Pr{SUBS receives a failure in (π‘, π‘ + ππ‘]} = Pr{π‘ < π_{πππ΅π}_{,}_{ππβπ} β€ π‘ + ππ‘} = π_{πππ΅π}_{,}_{ππβπ}(π‘)ππ‘Β Β (9)

where

- π
_{πππ΅π,ππβπ}: the random variable that represents the failure-free operating time of SUBS for SMβM, and - π
_{πππ΅π,ππβπ}(π‘): the PDF in terms of the failure of SUBS at time π‘ for SMβM.

The fault of SM must be a latent fault to cause the DPF; thus, the first term on the right-hand side of (8) (and (7) in [8]) should be written as Pr{SM is in a πππ‘πππ‘ state fault at time π‘} as explained in (3). This term can consequently be expressed using the following conditional probability:

Pr{SM is in a πππ‘πππ‘ fault state at π‘}

= Pr{SM is in a fault state at π‘

Β Β β© the fault of SM is not detected}

= Pr{the fault of SM is not detected|SM is in a fault state at π‘}

β Pr{SM is in a fault state at π‘}

= (1 β πΎ_{ππ}_{,}_{πΉππΆ}_{,}_{πππΉ}) Pr{π_{ππ} β€ π‘}

= (1 β πΎ_{ππ}_{,}_{πΉππΆ}_{,}_{πππΉ})πΉ_{ππ}(π‘)Β Β (10)

where

- πΎ
_{ππ,πΉππΆ,πππΉ}: the failure mode coverage of SM with respect to multi-point faults (Part10 8.3.3 of [1]); - π
_{ππ}: the random variable that represents the failure-free operating time of SM; and - πΉ
_{ππ}(π‘): the CDF in terms of the failure of SM.

The second term on the right-hand side of (8) can be expressed as follows:

Pr {M receives a fault in (π‘, π‘ + ππ‘]} = Pr{π‘ < π_{π} β€ π‘ + ππ‘} = π_{π}(π‘)πtΒ Β (11)

where

- π
_{π}(π‘): the PDF in terms of the failure of M.

Applying (3), (9), (10), and (11) to (8) yields the following expression:

βπππΉ_{ππβπ}_{,}_{π‘} = π_{πππ΅π}_{,}_{ππβπ}(π‘)ππ‘ = (1 βΒ πΎ_{ππ}_{,}_{πΉππΆ}_{,}_{πππΉ})πΉ_{ππ}(π‘) β π_{π}(π‘)ππ‘Β (12)

Then, equation (12) produces the following integral form:

Β (13)

Thus, applying equations (1) to (13) produce the following expression:

Β (14)

where

- π
_{ππ,πππΉ,πππ‘}: the failure rate of SM with respect to multi- point faults latent.

We consequently obtain the PMHF of SMβM by applying (14) to (2) as follows:

Β (15)

According to [4], we can classify the failure scenario based on the two categories shown in chapter III as *a)** *and *b)*. Equation 15 corresponds to case *a)*, which should be rewritten as π_{πππ»πΉ}_{,}_{ππβπ}_{,}_{πππ‘}.

For case *b)*, we assume a Markov process, i.e., that a fault of SM that is detected by SM2SM will be perfectly repaired (as good as new) and that the repair time will be ignored. Thus, we obtain (16) as

Β (16)

where

π_{ππ,πππΉ,πππ‘}: the failureΒ rate of SMΒ with respect to the detected multi-point faults.

Therefore, combining (15) and (16) to consider both cases yields the following expression:

Β (17)

Although Part 10 8.3.3 of [1] provides and describes only the equation for SM β M, we are advised to multiply by two when we consider both cases, such as βSM β Mβ and βM β SM.β However, as this approach provides an un-exact result, we derive the exact result in the next section.

*
M*β

*SM*

*case*

A typical example of a redundant subsystem is a body control module (BCM), which is a type of electronic control unit (ECU). It may have circuitry that includes headlight drivers driven by a microcomputer with backup circuitry hardware to maintain visibility if the microcomputer stops. According to the generalization explained in the previous chapter, we assume the general redundant subsystem shown in Figure 2. A latent fault exists in this situation even for the M β SM case. For convenience, we introduce the probability coefficient πΎ_{π,πΉππΆ,πππ‘} that will be removed eventually. It is defined in (18) and refers to the detection ratio by the primary SM. This coefficient is 100% in a non-redundant subsystem and 0% in a redundant subsystem (e.g. 1oo2). Conversely, we should define an *intrinsic *redundant subsystem with πΎ_{π,πΉππΆ,πππ‘} as 0%. On the other hand, we may refer to the subsystem including dual-core lock-step (DCLS) as an *extrinsic** *redundant subsystem although it has redundant processor cores because its πΎ_{π,πΉππΆ,πππ‘} is 100%.

We define πΎ_{π,πΉππΆ,πππ‘} as

πΎ_{π,πΉππΆ,πππ‘} β‘ Pr{a fault is detected by SM|M is in a prevention state at π‘}Β Β (18)

We obtain the small dual-point PoF for the M β SM case using (3), (5), (11), and (18):

βπππΉ_{πβππ,π‘} = Pr {M is in a latent fault state at π‘

β© SM receives a fault in (π‘, π‘ + ππ‘]}

= Pr {M is in a prevention state at π‘

β© the fault is not detected by SM}

β Pr {SM receives a fault in (π‘, π‘ + ππ‘]}

= Pr{the fault is not detected by SM|

M is in a prevention state at π‘}

β Pr{M is in a prevention state at π‘}

β Pr {SM receives a fault in (π‘, π‘ + ππ‘]}

= (1 β πΎ_{π,πΉππΆ,πππ‘})πΎ_{π,πΉππΆ,π
πΉ}πΉ_{π}(π‘)π_{ππ}(π‘)ππ‘.Β Β (19)

Considering the same argument in both cases, as in *a) *and *b) *in the section of this paper on βRelated Work,β we can obtain the following expression:

Β (20)

Adding (6), (17), and (20), we finally obtain the following generalized equation:

Β (21)

Applying the PMHF Equation to Non-Redundant and Redundant Subsystems

**Non-Redundant Subsystem**

For an item with non-redundant and redundant subsystems, we can apply (21) to both subsystems. In a non-redundant subsystem, because the fault detection and prevention of the safety goal violation are performed by SM (primary safety mechanism), πΎ_{π,πΉππΆ,πππ‘} equals 1 (100%). Applying this relationship to (21) yields the following expression:

Β (22)

This is the same formula that is described in Part 10 8.3.3 of [1].

**Intrinsic Redundant Subsystem**

Because an intrinsic redundant subsystem does not have a residual fault, we assume that πΎ_{π,πΉππΆ,πππ‘}Β = 0 and πΎ_{π,πΉππΆ,π
πΉ}Β = 1 (100%). Applying these relationships to (21) generates the following expression:

Β (23)

Conclusion and Future Work

In this study, we have presented generalized formulas for the calculation of PMHF in non-redundant and redundant subsystems using observable parameters, such as the failure rate of a mission function and a safety mechanism, the diagnostic coverages of the primary and secondary safety mechanisms and the diagnostic periods of the secondary safety mechanisms to expand the scope of the application according to ISO 26262. Because the PMHF of an item can be quantitatively calculated using FTA, we plan to prepare FTA models based on the methodology described in this paper.

References

- ISO/TC 22/SC 3, “ISO 26262-5:2011(E),”Β ISO, 2011.
- S. H. Jeon, J. H. Cho, Y. Jung, S. Park, and T. M. Han, “Automotive hardware development according to ISO 26262,” in 13th Int. Conf. Advanced Commun. Technol. (ICACT2011), Seoul, 2011, pp. 588β592.
- N. Adler, S. Otten, M. Mohrhard, and K. D. MΓΌller-Glaser, “Rapid safety evaluation of hardware architectural designs compliant with ISO 26262,” in Int. Symp. Rapid Syst. Prototyping (RSP), Montreal, QC, 2013, pp. 66β 72.
- N. Das, and W. Taylor, “Quantified fault tree techniques for calculating hardware fault metrics according to ISO 26262,” in IEEE Symp. Product Compliance Eng. (ISPCE), Anaheim, CA, 2016, pp. 1β8.
- A. Birolini, Quality and Reliability of Technical Systems, Theory Practice Management 2nd Edition, Springer, 1997, pp. 365.
- V. Rupanov, C. Buckl, L. Fiege, M. Armbruster, A. Knoll, and G. Spiegelberg, “Early safety evaluation of design decisions in E/E architecture according to iso 26262,” in Proc. 3rd Int. ACM SIGSOFT Symp. Architecting Critical Syst., Bertinoro, Italy, 2012, pp. 1β10.
- K. L. Leu, H. Huang, Y. Y. Chen, L. R. Huang, and K. M. Ji, “An intelligent brake-by-wire system design and analysis in accordance with ISO-26262 functional safety standard,” in Int. Conf. Connected Veh. Expo (ICCVE), Shenzhen, 2015, pp. 150β156.
- M. Takeichi, Y. Sato, K. Suyama, and T. Kawahara, “Failure rate calculation with priority FTA method for functional safety of complex automotive subsystems,” in Int. Conf. Quality, Rel., Risk, Maintenance, Safety Eng., Xi’an, 2011, pp. 55β58.

**Atsushi Sakurai** is the Chief Executive Officer and Chief Technology Officer of FS-Micro Corporation. He can be reached at sakurai@fs-micro.com.

## Leave a Reply