Editor’s Note: The paper on which this article is based was originally presented at the 2024 IEEE Product International Symposium on Product Compliance Engineering (ISPCE), held in Chicago, IL in April-May 2024, where it received the 2024 Best Paper Award. It is reprinted here with the gracious permission of the IEEE. Copyright 2025, IEEE.
Introduction
The International Electrotechnical Commission (IEC) has established the IEC 62368-1 standard, grounded in hazard-based safety engineering (HBSE) principles, as a pivotal framework for the design and evaluation of audio, video, and information and communication technology (ICT) equipment. HBSE emphasizes the identification and mitigation of risks by evaluating the safety of a product under normal operating, abnormal operating, and single-fault conditions, as well as acknowledges a variety of potential hazards. This standard organizes energy sources into categories—electrical energy sources (ES), thermal energy sources (TS), mechanical energy sources (MS), radiation energy sources (RS), and power sources (PS)—based on their capacity for energy transfer and potential harm [1-2].
IEC 62368-1 addresses numerous hazards, including electric shock, mechanical, heat, radiation, chemical, and fire risks. Yet, its current iteration primarily presumes that safety mechanisms are built-in or are physical hardware safeguards, with minimal explicit focus on control-based safety, especially where hazard prevention significantly depends on or is facilitated by software. In the digitalization and Internet of Things (IoT) era, where software increasingly governs devices—including vital safety functions like overtemperature protection, fire prevention, and other types of hazard monitoring and control—this oversight in considering software’s role in safety assurance demands thorough examination [3‑4]. After an extensive literature review [5-14], the authors first propose a concept of the “Composition-Based Safety View” in this field, which explains the nature and characteristics of safety at a product level. Figure 1 provides an overlap infographic to illustrate the connotation of this concept.

As it shows, there are three overlapped areas that still need to be covered by IEC 62368. More specifically, the ① represents those functional-safety related software not affected by remote communications through the public network; the ② represents those functional-safety related software that also can be affected by remote communications through the public network; the ③ represents those non-functional-safety related software that can affect the HBSE evaluation results, and also can be affected by remote communications through the public network, e.g., the remote software update involves the changes of the safety-critical operating parameters (e.g., the RPM or duty cycle) of DC fan which running at a constant speed during normal operating condition.
This paper argues that the current HBSE standard exhibits a deficiency in encompassing software or control-oriented safety assessments, leaving a vital facet of product safety unexplored and heightening the potential for safety incidents arising from software malfunctions or systemic failures. The exploration of introducing the control-oriented model into HBSE is essential for achieving comprehensive product safety in the software-driven era. This investigation aims to address this fundamental oversight and bridge the identified gap.
Typical Modular Switching Fan Tray Design and Hazard Analysis
Figure 2 is a typical modular switching fan tray control board design. Based on the current IEC 62368-1 requirement, individual fan locking should be conducted. Generally, for the usual stuck and single fan disable cases, one fan failure doesn’t impact other fans’ functioning; the system will continue to operate without significant degradation in cooling. However, if the fan is short internally, the power bus (i.e., P54V_CTL) will be short as well. To protect the whole system, the hot-swap acts to turn off the whole fan tray’s power entry, which makes all fans stop spinning.
In this situation, the temperature of the chassis will increase rapidly as there is no forced cooling, and overheating will happen. To avoid fire hazard or thermal hazards, the microcontroller must report the issue to the system (global) controller (i.e., CPU) through I2C (inter-integrated circuit) immediately, then the CPU makes the decision and sends a power-off command to the power supply via PMBus (power management bus), shuts down the chassis timely. The fact is that more and more protection designs in ICT equipment rely on the microcontroller/processor, which involves a “hardware + software” combination protection. Unfortunately, IEC 62368 doesn’t provide any information regarding how to evaluate the integrity and robustness of such control-based protection.
Based on ISO/IEC Guide 51 [15], the definition of safety is “freedom from unacceptable risk,” while risk is a “combination of the probability of occurrence of harm and the severity of that harm.” Harm is “injury or damage to the health of people, or damage to property or the environment,” and hazard is “potential source of harm.” Therefore, during the product safety evaluation, all hazards should be identified first, then the risk caused by the hazard should be assessed quantitatively or qualitatively. Finally, appropriate technical and management measures should be implemented to reduce the risk to an acceptable level. Many methods are available for hazard analysis and risk assessment (HARA). The current mainstream hazard analysis methods or tools include bow-tie analysis (BTA), event tree analysis (ETA), and layer protection analysis (LOPA). Moreover, some time-dependent approaches are suitable for capturing dynamic states and complex systems like Markov Analysis, Petri Nets, and Monte Carlo simulation. However, as this paper focuses on ICT equipment safety assessment, the following three approaches will be introduced as they are more suitable in practice: fault tree analysis (FTA); failure modes and effects analysis (FMEA); and hazard and operability studies (HAZOP).
Fault Tree Analysis (FTA)
Fault tree analysis (FTA) is a systematic, deductive, and hierarchical risk assessment method used to identify potential causes of system failures within safety engineering [16]. This analytical technique visualizes the pathways through which various subsystems or components can lead to a top-level failure event, using a tree-like structure of logical symbols that represent the interrelationship between failures, external factors, and human errors.
In the HBSE context, FTA can provide a rigorous means to dissect the large core switching fan-tray architecture design and its associated failure modes. By mapping out all or most conceivable failure scenarios, FTA aids in pinpointing critical control points where the control-based model can effectively mitigate risk. It enables the identification of both random hardware failures and systematic failures that may arise from hardware and software interactions, thereby offering a comprehensive view of potential hazards. The figure below shows the FTA for the thermal event of fire by the modular switching chassis.
Failure Modes and Effects Analysis (FMEA)
Failure modes and effects analysis (FMEA) is another popular engineering technique for identifying potential failure modes and evaluating their impact on system safety [17]. It prioritizes risks based on severity, occurrence, and detectability. Despite its efficacy, FMEA encounters challenges in complex, control-oriented systems like the large core switching fan-tray architecture. Specifically, it may not fully capture concurrent failures or the intricate interactions between hardware and software, which are critical in modern systems.
Nonetheless, FMEA is invaluable for creating a comprehensive inventory of possible failure modes for each component within the system, facilitating an in-depth analysis of their causes and effects. This process enables the identification of critical controls and safeguards to mitigate system failures. To address its limitations, integrating FMEA with other methodologies, such as FTA or simulation tools, can provide a more holistic understanding of system vulnerabilities, including those from hardware-software interplay and concurrent failures. While FMEA faces limitations in analyzing the control-oriented model, it remains integral to identify failure modes, and guiding effective mitigation strategies is crucial for hazard-based safety engineering, ensuring safety through comprehensive risk management strategies.
Hazard and Operability Studies (HAZOP)
Hazard and operability studies (HAZOP) is a tool used to identify potential system hazards and operational issues that cause deviations (or failure points) from design objectives. It was initially used to analyze process control systems in chemical plants but has since extended to other types of systems, including complex control systems and software-intensive designs [18]. HAZOP is a qualitative hazard analysis technique based on specific guide words (GW) such as “more,” “less,” “no,” “reverse,” and “delay,” alongside various critical parameters (e.g., power, speed, temperature, pressure). This approach allows for the thorough and systematic identification of design flaws that could lead to hazards or operational issues early in the product development phase. Guide words are utilized at each node or function, serving as a catalyst for team members to identify any possible causes and consequences and determine whether existing safeguards protect the product well.
Table 2 provides an example to illustrate the HAZOP application for the fan speed-up function. The HAZOP can be applied for any safety-critical functions.
In summary, based on the above-mentioned hazard analysis results, lots of hazard prevention depends on the related control functions being executed correctly. Therefore, introducing control-oriented safety analysis into the existing HBSE framework is imperative and necessary, to ensure comprehensive safety assessment in the new era.
Introducing the Control-Oriented Model Into HBSE
Time–The Key Element for Control-Oriented Safety
The element of “time” is foundational for the control-oriented models and functional safety assessments [19-25], acting as a pivotal element in ensuring timely responses to hazardous events. Time factors such as fault-tolerant time interval (FTTI), fault detection time interval (FDTI), and diagnostic test intervals (DTI) are integral to designing safety functions or products that prevent hazardous accidents. The product or system must detect and respond to potential hazards within defined time limits to mitigate risks effectively. Some existing safety standards have defined and listed these time-related parameters. Key temporal factors include:
Fault tolerant time interval (FTTI): Originally defined by ISO 26262-1, FTTI represents the maximum allowable time between the occurrence of a fault and the point at which the system must detect and respond to the fault to prevent unsafe conditions. This interval is critical for safety applications and reflects the urgency and efficiency of the safety mechanisms activated.
Process safety time (PST): As outlined by IEC 61508‑4, PST refers to the time available to bring a process to a safe state before the hazardous event occurs. This interval is crucial in industrial control systems, where delays in response times can lead to significant safety incidents.
Fault handling time interval (FHTI): This metric quantifies the time taken to manage and mitigate a fault once detected, encompassing the processes of fault identification, isolation, and system recovery or failover to a safe state.
Fault detection time interval (FDTI): This interval measures the time from the onset of a fault to its detection by the system’s diagnostic mechanisms. Rapid fault detection is essential to minimize the exposure to potential hazards and initiate timely corrective actions.
Fault reaction time interval (FRTI): This denotes the time required for a system to react to a detected fault, implementing necessary measures to maintain safety. This interval is critical for ensuring systems can effectively counteract faults before they escalate into unsafe conditions.
Diagnostic test (time) interval: This refers to the scheduled or on-demand execution of diagnostic tests designed to detect latent faults within the system. The frequency and timing of these tests are vital for maintaining an ongoing assessment of system health and ensuring high safety availability.
Figure 4 provides a clear illustration of several time concepts related to control-oriented safety.
“Time” Consideration in HBSE
In the current hazard-based safety engineering (HBSE) standard (i.e., IEC 62368-1), the consideration of the element of “time” shows a fragmented state when conducting hazard analysis and risk assessment (HARA). This inconsistency is evident in the definition and classification of different hazardous energy sources within the standard. While “time” is explicitly considered in the context of certain hazards, such as those associated with fire (power sources) and thermal risks (thermal energy sources), it is notably absent or not directly emphasized in the definitions and classifications of other hazard sources. These include electric shock hazards (electric energy sources), the dissemination and contact with hazardous substances, mechanical injury (mechanical energy sources), and radiation injury (radiation energy sources) [26- 27].
Although some static energy sources, such as the surface sharpness of equipment, are difficult to relate to the concept of “time.” There is a clear opportunity for the other dynamic energy sources to incorporate “time” into risk evaluations more systematically. This would involve acknowledging the temporal dynamics of hazard exposure, energy change, personal response, etc. Table 3 summarizes the “time” element consideration in each energy source classification by IEC 62368-1, which also provides insight for extending and refining the existing energy source classification in the future standard development and update.
Comparisons Between Traditional HBSE and D-HBSE
As Figure 5 shows, the current HBSE framework does not fully account for the temporal dynamics of energy sources. It fails to capture the “state changes” that occur either due to autonomous changes in the energy sources over time or due to the enforced changes imposed by control models. This oversight can lead to an incomplete assessment of the dynamic characteristics of hazards.
Developing the new dynamic hazard-based safety engineering (D-HBSE) by introducing the control-oriented model which acts as a safeguard. It forms a closed control loop by connecting energy transfer paths with signal transfer paths together. The D-HBSE enhanced the original HBSE model as it allows for:
Continuous monitoring and adjustment: The control model can continuously monitor the state of the energy sources and adjust their operation to maintain safety, accounting for the temporal variability of hazards.
Predictive analysis: By incorporating time-based data and control model outputs, the D-HBSE can predict potential hazard states before they occur, enabling preemptive action.
Adaptability and flexibility: The control model enables the system to adapt to both anticipated and unforeseen changes over time, ensuring long-term safety and reliability.
To facilitate a clearer and more intuitive understanding of the features of existing HBSE and the D-HBSE, Table 4 provides a detailed comparison of their respective protection mechanisms. While the HBSE offers a more diverse array of protection mechanisms, they are predominantly confined to physical forms, which are more passive and reactive. On the other hand, the control-oriented protection added by D-HBSE is more straightforward and direct, with simplicity and proactivity.
It is important to emphasize that dynamic hazard-based safety engineering (D-HBSE) does not seek to replace the existing HBSE framework. Rather, it is fully compatible with and inherits everything from the existing HBSE, just simply extending the scope by adding the possibility of an additional type of protection mechanism. The D-HBSE enhances the established HBSE by incorporating dynamic elements that are especially relevant in the context of modern systems, which often involve complex interactions between hardware and software.
Guidelines for Implementing and Evaluating D-HBSE
This part will provide guidelines for the implementation of control-based safety, it will be discussed from both hardware and software perspectives. This dual perspective is essential because the integration and interaction between hardware and software are critical to the overall safety of the control-based model.
Hardware Design (Safety Mechanism) and SIL Calculation
Hardware-related safety mechanisms are a crucial aspect of hardware functional safety design and constitute a significant component of the overall safety strategy. Table 5 summarizes the content from IEC 61508-2 Annex A and ISO 26262-5 Annex D, outlining the safety mechanisms and diagnostic coverage rates for potential faults in different components. This provides a foundational basis for subsequent calculations of hardware probability metrics.
Figure 7 is the schematic of temperature sensing circuits, which are part of the fan tray controller board and against the fire hazard.
During the hardware safety development stage, implementing safety mechanisms in the hardware design is just one aspect of ensuring safety. It is also essential to perform probabilistic measures of hardware random failures to ensure that the residual risk associated with the safety function is acceptably low. Failure modes effects and diagnostic analysis (FMEDA) is a valuable tool for performing these quantitative calculations. Table 6 illustrates how FMEDA is used to calculate probabilistic hardware metrics. This paper selects the SFF (safe failure fraction) and PFH (probability of dangerous failure per hour), which are from IEC 61508, as the metric indicator; besides this, the SPFM (single point fault metric) and PMHF (probabilistic metric for random hardware failures) from ISO 26262 can also be used as they are similar.
Software Design and Assessment
The software development should follow the V-model as Figure 8. The V-model is a best practice in the safety-critical software development lifecycle, emphasizing a methodical approach to developing electronic control systems. It delineates a process that begins with the establishment of system requirements and progressively drills down to more granular software requirements, architectural designs, and module designs, forming the descending limb of the “V.” This progression embodies the decomposition of requirements, with each step laying the groundwork for the subsequent phase, ensuring that development is aligned with safety goal and corresponding safety functional requirement.
As the lifecycle advances to the ascending limb of the “V,” the focus shifts towards validation and verification, mirroring the earlier stages of development with corresponding levels of testing. Unit testing examines the smallest parts of the application in isolation, followed by integration testing where these parts are combined and evaluated as a whole. System testing then assesses the complete, integrated system against the defined requirements to ensure compliance. The end of this process is validation, which ensures the final product meets the intended safety goals and related requirements, the model emphasizes thorough testing and safety assurance from concept to completion.
In the V-model, the concepts of validation and verification are distinct yet frequently conflated. Validation is the process of evaluating software at the end of the development process to ensure it meets the requirements (safety) for the end user. Verification, on the other hand, occurs throughout the development process. It involves checking that the product is built correctly according to the specifications and design documents. Figure 9 illustrates the differences between verification and validation.
Conclusion
The rapid evolution of technology necessitates a reevaluation of product safety principles to establish a more encompassing framework. Upgrading HBSE to dynamic HBSE (D-HBSE) by integrating a control‑oriented model is crucial to maintaining the efficacy of safety standards for ICT equipment in light of technological advances.
This paper contributes in three significant ways. First, this is the first time to propose the concept of dynamic HBSE (D-HBSE) and develop the new three-block model by adding the feedback path to implement the whole control loop, which makes the existing HBSE eligible to evaluate those products with software-controlled safety functions. Second, even though the authors have explored how to integrate functional safety into HBSE previously [4], it mainly focuses on the rationale and assessment process, and lacks in-depth gap analysis from a design technical and practical perspective, this paper conducts a detailed and comprehensive comparison of the protective means (i.e., safeguards) between HBSE and D-HBSE, and highlight the “time” element is the key for “dynamic” characteristic in the D-HBSE, meanwhile, propose the potential gaps and future extension directions for each energy source (ES) classification and definitions which were listed in existing HBSE standard. Last, it offers detailed guidelines for implementing and evaluating control-oriented safety functions within the D-HBSE framework, serving as a valuable resource for engineering design.
Limitations
This study, while offering insights into the integration of a control-oriented model with HBSE, recognizes its preliminary nature and identifies avenues for further research. First, the application of control-oriented safety, a relatively novel concept among the increasing complexity of hardware-software fusion in product design, presents challenges. Traditional safety analysis methods like FTA and FMEA may not fully address these complexities, and incorporating advanced methods like STPA into HBSE is a promising yet underexplored area. Second, the current assessment primarily references IEC 61508 and ISO 26262 standards. Future research could extend to other industry-specific standards, such as IEC 60730-1, ISO 13849-1, and IEC 62061 [28-30], which may offer more streamlined evaluation approaches under the IEC 62368 context. Last, as the lines between (cyber)security and functional safety begin to blur, particularly with the increased use of remote-control functions in ICT equipment, integrating cybersecurity evaluations into HBSE frameworks remains an essential research topic, especially where safety-related data communication is concerned.
Acknowledgment
The authors are very grateful to the International Electrotechnical Commission (IEC) and International Organization for Standardization (ISO) for permission to reproduce information from their publications, including IEC 62368, IEC 61508, and ISO 26262 series. IEC and ISO copyright all such extracts, they have no responsibility for the placement and context in which the authors reproduced.
References
- Audio/video, Information and Communication Technology Equipment – Part 1: Safety Requirements, IEC 62368-1, Edition 3.0, 2018.
- Audio/video, Information and Communication Technology Equipment – Part 2: Explanatory information related to IEC 62368-1, IEC/TR 62368-2, Edition 3.0, 2019.
- Nancy G. Leveson. Engineering a safer world: Systems thinking applied to safety. The MIT Press, 2016.
- Shun Zhang and Haiwen Lu. Integrating Functional Safety into Hazard-Based Safety Engineering: Towards a Comprehensive Framework. 2023 IEEE International Symposium on Product Compliance Engineering – Asia (ISPCE-ASIA), Shanghai, China, 2023, pp. 1-8, doi: 10.1109/ISPCE-ASIA60405.2023.10365871.
- Lin Xie, et al. Performance analysis of safety barriers against cascading failures in a battery pack. Reliability Engineering & System Safety, 228 (2022).
- Yiliu Liu. Risk management of smart healthcare systems: Delimitation, state-of-arts, process, and perspectives. Journal of Patient Safety and Risk Management, 27.3 (2022): 129-148.
- Sergio Jimeno Altelarrea, et al. STPA enabled safety assessment in the architecting of complex systems. Safety and Reliability. Vol. 41. No. 4. Taylor & Francis, 2022.
- Ivo Friedberg, et al. STPA-SafeSec: Safety and security analysis for cyber-physical systems. Journal of information security and applications 34 (2017): 183-196.
- Aibo Zhang, et al. Investigation of the compressed air energy storage (CAES) system utilizing systems- theoretic process analysis (STPA) towards safe and sustainable energy supply. Renewable Energy 206 (2023): 1075-1085.
- David Marcos, et al. Functional safety BMS design methodology for automotive lithium-based batteries. Energies 14.21 (2021): 6942.
- Hatice Ceren Ates, et al. End-to-end design of wearable sensors. Nature Reviews Materials 7.11 (2022): 887-907.
- Yue Wang, et al. Privacy risk assessment of smart home system based on a STPA–FMEA method. Sensors 23.10 (2023): 4664.
- Marvin Rausand and Ingrid Bouwer Utne. Product safety–Principles and practices in a life cycle perspective. Safety Science 47.7 (2009): 939-947.
- Nancy G. Leveson. Rasmussen’s legacy: A paradigm change in engineering for safety. Applied ergonomics 59 (2017): 581-591.
- Safety aspects – Guidelines for their inclusion in standards, ISO/IEC Guide51, Edition 3.0, 2014.
- Fault tree analysis (FTA), IEC 61025, Edition 2.0, 2006.
- Failure modes and effects analysis (FMEA and FMECA), IEC 60812, Edition 3.0, 2018.
- Hazard and operability studies (HAZOP studies) – Application guide, IEC 61882, Edition 2.0, 2016.
- Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 1: General requirements, IEC 61508-1, Edition 2.0, 2010.
- Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety- related systems, IEC 61508-2, Edition 2.0, 2010.
- Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 3: Software requirements, IEC 61508-3, Edition 2.0, 2010.
- Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 4: Definitions and abbreviations, IEC 61508-4, Edition 2.0, 2010.
- Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 5: Examples of methods for the determination of safety integrity levels, IEC 61508-5, Edition 2.0, 2010.
- Road vehicles – Functional safety – Part 1: Vocabulary, ISO 26262-1, Edition 2.0, 2018.
- Road vehicles – Functional safety – Part 5: Product development at the hardware level, ISO 26262-5, Edition 2.0, 2018.
- Safety of laser products – Part 1: Equipment classification and requirements, IEC 60825-1, Edition 3.0, 2014.
- Safety of laser products – Part 2: Safety of optical fibre communication systems (OFCSs), IEC 60825-2, Edition 4.0, 2021.
- Automatic electrical controls – Part 1: General requirements, IEC 60730-1, Edition 6.0, 2022.
- Safety of machinery – Safety-related parts of control systems – Part 1: General principles for design, ISO 13849-1, Edition 3.0, 2015.
- Safety of machinery – Functional safety of safety-related control systems, IEC 62061, Edition 2.0, 2021.