Editor’s Note: The paper on which this article is based was originally presented at the 2020 IEEE International Symposium on Product Safety Engineering held virtually in November 2020. It is reprinted here with the gracious permission of the IEEE. Copyright 2020 IEEE.
Industrial human-robot collaboration (HRC) promises a more flexible production and more direct support for human workers [1]. In HRC applications, human and robot work in close vicinity or even in direct collaboration. Safety fences, which have traditionally been used to ensure the safety of human workers, are (at least partially) absent. Instead, sensor- and software-based safety measures, such as laser scanners, light curtains, velocity limitation, and collision detection, are used to ensure that the robot system does not pose any hazard to human workers. Safety flaws in the configuration of these safety measures can lead to hazards. Thus, a thorough safety validation is required. Furthermore, ISO 10218‑2, the safety standard for industrial robot systems, specifically states that prior to commissioning, a risk assessment must be conducted to identify and assess potential hazards [2].
The sooner a hazard is uncovered in the development process, the fewer corrective changes to the system have to be made later. Since early changes require smaller iterations in the development process and thus are less costly (see Figure 1), it is desirable to identify hazards as early as possible. In early development stages, there is usually no physical implementation available that could be used for this purpose. Instead, early development stages typically rely on simulation models, e.g., for planning the cell layout or optimizing the workflow. It would be beneficial to use these simulation models also for the early identification of potential hazards. However, to find hazards in simulation, one must overcome a major challenge: In many cases, hazards are hidden. This means that there are certain safety-critical flaws in the design of the system which may result in hazards, but only become manifest in specific situations. In a dynamic simulation, it can be very difficult to find simulation sequences that uncover these hazardous situations, especially when the simulation is highly detailed.
A recent promising approach to this problem is the concept of adaptive stress testing (AST) [3]. AST exposes hazards with a reinforcement learning agent that creates adversarial testing conditions. AST was successfully applied in several safety-critical domains such as aerospace engineering [4] and autonomous driving [5]. In this paper, we show how AST can be applied to find hazards in robot systems. We use a Monte Carlo Tree Search (MCTS) algorithm to control a virtual human model which we place in a simulation model of the robot system. The MCTS acts as an optimization algorithm that adapts human behavior to maximize a risk metric, thereby creating high-risk situations which are more likely to uncover hazards. In other words, the human model exposes hazards by learning to provoke hazardous situations in simulation. Although this approach cannot guarantee to find all existing hazards, it can help to uncover hazards that would have been overlooked otherwise, especially those that only become apparent in very specific situations.
Related Work
Safety engineering typically relies on methods like “Hazard and Operability Analysis” (HAZOP) [6], “Failure Modes and Effects Analysis” (FMEA) [7], or “Systems-Theoretic Process Analysis” (STPA) [8] to identify hazards. These methods are semi-formal, that is, they define a certain hazard identification procedure but largely rely on human reasoning. They can be applied to a wide range of safety-critical systems.
There are also several novel approaches that are specifically aimed at robotics: Guiochet proposed the use of HAZOP-UML, a HAZOP extension that uses UML diagrams, for analysis of robot systems [9]. Marvel et al. have proposed a task-based method that supports risk assessment using an ontology of HRC tasks [10]. Awad et al. have developed a rule-based expert system for risk assessment of HRC workplaces [11]. Their tool allows the user to model the workplace using a model of products, processes, and resources (“PPR model”). The PPR model properties are mapped to hazards based on a set of pre-defined rules. The method “SAFER- HRC,” developed by Askarpour et al. [12]–[14] and Vicentini et al. [15], uses formal verification methods for safety verification of HRC systems.
While all of these methods are suitable to identify hazards in robot systems, they do have some limitations: semi-formal methods rely largely on human reasoning and domain-specific knowledge and thus can be difficult to apply to novel and complex systems. Formal and rule-based approaches require a specific system model like the formal language description from [12] or the PPR model from [11] which must be obtained specifically for the purpose of hazard identification. Furthermore, these models typically require significant modeling simplifications.
An alternative approach that avoids these problems is simulation-based safety testing. In the field of robotics, simulation-based safety testing is typically done on a component level, e.g., for testing safety-critical control code.
Examples of this are seen in the works of Araiza-Illan et al. [16], [17], Bobka et al. [18], and Uriagereka et al. [19]. In contrast, the use of simulation-based testing to identify hazards on a system level is still relatively unexplored.
Proposed Approach
Objective, Assumptions, and Basic Idea
This paper explores a novel concept that uses simulation to find hazards in robot systems. As explained in the introduction, a major challenge is that in many cases hazards only manifest themselves in specific situations. In a dynamic simulation environment, the number of possible simulation sequences can be vast. Thus, it can be difficult to create specifically those simulation sequences that lead to situations where existing hazards are uncovered.
Our approach relies on the assumption that the behavior of the robot system is deterministic for a given human behavior. This means that if there are inherent hazards in the system, then there are certain human behaviors for which these hazards manifest themselves in form of an unsafe state, that is, an accident or near-accident. This assumption leads to the basic idea behind our approach: To expose hazards by creating high-risk human behavior that provokes accidents. To achieve this, we draw on the concept of AST [3]: we use the MCTS Algorithm from [3] to control a virtual human model which is placed in a simulation model of the robot system under test. By optimizing the behavior of the virtual human to maximize a risk metric, the algorithm provokes unsafe situations. As our proof of concept will show, this approach can significantly increase the chance of finding hazards in simulation.
Problem Formulation
Formally, the approach can be framed as a search problem where the goal is to find sequences of human actions that result in unsafe states. The search problem is described in a 5-tuple:
(S, U, A, φ, s0 ) (1)
where S is a set of simulation states that describe the combined configuration of the human model and the robot system model (including not only the robot itself but also other safety-related components, e.g., sensors). U is a user-defined subset of S that includes unsafe states, that is, states that violate a certain safety condition formulated by the user. The set A consists of the actions which can be performed by the human model in simulation. Note that in the following proof of concept example, A is a set of simple human movement primitives. However, A does not necessarily have to consist only of movements. It could also include other human actions that are relevant to the system under test, such as operator commands to the system. The function φ is a transition function that returns the next state given the current state and a human action: s’ = φ (s, a). This function is implemented by the simulation, that is, the next state is obtained by simulating the interaction between the human and robot system for a given human action. Starting from the initial simulation state s0, the goal is to find sequences of human actions a1, a2, …an which, when simulated in interaction with the robot system, result in an unsafe simulation state s ∈ U . The difficulty is that U is only known implicitly: While it is easy for the user to define certain high-level safety constraints (e.g., “all collisions with the robot must be avoided”), it is unknown what the specific system states are in which these constraints are violated, and which action sequences lead to them.
Search Procedure
We solve this search problem with an iterative search procedure as shown in Figure 2: the MCTS algorithm iteratively selects a human action which is then carried out by the human model in interaction with the robot system model. After each action, the current simulation state s is evaluated in a safety check to determine if an unsafe state s ∈ U is reached. Furthermore, a reward R is calculated and fed back to the MCTS algorithm. This reward is designed in a way that encourages dangerous behavior and thus accelerates the finding of hazards. If an unsafe state is reached, the simulation stops, and the user can examine the hazard by replaying the simulation sequence that has led to the unsafe state. The user can then eliminate the hazard by implementing appropriate safety measures and restart the search with an updated simulation model to find further hazards. If desired, this process can be repeated throughout the whole system design stage.
Note that the set U of unsafe states depends on the safety condition that is defined by the user. Depending on the context of the application, one might define conditions based on criteria like velocity and distance (e.g., “all contact between human and robot must be avoided while the robot is moving with a velocity greater than X”) or on collision characteristics like collision force and affected body part. (e.g., “all collisions that subject body part X to a collision force greater than Y must be avoided”). For reasons of computational complexity, the following proof-of-concept example will use a simple velocity/distance criterion. In the future, we will also include a collision force estimation into our method to allow for more sophisticated safety criteria.
Proof of Concept
This section presents our proof-of-concept example: We use the MCTS algorithm of [3] and the simulator CoppeliaSim (formerly known as V-REP [20]) to implement the search procedure from Figure 2. We then use this implementation to find hazards in an industrial robot cell. It should be noted that being a proof of concept, the presented implementation contains several simplifications which we will address in our future work.
Implementation
Human Model: We use a simple human model from CoppeliaSim and augment it with additional joints so that it can perform a set of basic motions (five walking- and six upper-body motions, amounting to an action space of 30 combined motions):
AWalking = {(walk forward), (2)
(turn left 45°), (turn left 90°),
(turn right 45°), (turn right 90°)}
AUpperBody = {(move body upright), (3)
(bend forward), (bend left),
(bend right), (bend forward and right),
(bend forward and left)}
A = AWalking × AUpperBody (4)
|A|= 5 · 6 = 30 (5)
Note that in our example, A does not include arm motions. Arm motion is quite complex and representing it via explicit actions would likely lead to an explosion of the search space. Instead, we use an octree based on a reachable arm workspace computation [21] to determine if the robot is within human reach. The parameters of the human model are shown in Table 1.
Parameter | Value | Source |
Body Height | 1.78 m | Test person measurement |
Upper arm length lU | 0.30 m | Test person measurement |
Lower arm length lL | 0.31 m | Test person measurement |
Hand length lH | 0.18 m | Test person measurement |
Walking speed | 1.6 m/s | Specified in [22] |
Max. Angle forward flexion | 55° | Derived from [22] |
Max. Angle lateral flexion | 35° | Specified in [23] |
Table 1: Human model parameters
Algorithm: To control the human model, we use the MCTS algorithm from [3]. For reasons of brevity, we only give a simplified explanation of the algorithm here. For a full explanation, we refer to [3]. The algorithm iteratively samples sequences of human actions from A and executes them in the simulation. In keeping with the terminology of [3], we call these action sequences episodes. After each action, it is checked if an unsafe state s ∈ U has been reached. If this is the case, or if a maximum number of actions is reached, the episode terminates. The simulation is then set back to the initial state s0 and a new episode begins. With each episode, the algorithm incrementally expands a search tree, in which the edges correspond to human actions and the nodes to simulator states.
We employ two variations of this algorithm: one basic version, which we call MCTS1, and one variation, which we call MCTS2. Whereas MCTS1 always starts its search at the initial simulation state s0, MCTS2 commits to the most promising action after a certain number of episodes and uses the resulting simulation state as a new starting point. This results in a more exploitative search behavior.
Reward: After each action, the algorithm receives a reward R. Based on the reward, a state-action value function is estimated which is used to adapt sampling of actions in future episodes. The reward should increase the chance of finding hazards by encouraging a more dangerous behavior of the virtual human. Thus, the occurrence of dangerous situations should be rewarded, whereas the occurrence of safe situations should be penalized. To quantify the level of danger that a situation holds, we define a safety index cS:
cS = (d2HR + 1) · e-vR (6)
where dHR is the human-robot distance and vR is the cartesian velocity of the fastest robot joint. The value of cS is large for safe configurations (i.e., large distance, low speed). Since we want to encourage unsafe situations, we give the inverse
where k indicates the current step within the episode and n is the episode length. (Note that the reward structure differs from [3], where there is also a component that rewards the probability of actions. We changed this as we are interested in finding hazards independently of their probability.)
Test Scenarios
As a basis for the proof-of-concept tests, we chose the industrial robot cell shown in Figure 3. This cell combines typical safety features of industrial robot systems: Safety fences, a laser scanner, and a light curtain. In the center of the cell, there is a U-shaped table on which the robot is mounted. The robot imitates a pick-and-place task between the two sides of the table. To intervene in the process, e.g., to refill parts, workers can approach the table either by walking through the laser scanner field or by passing through a light curtain at the back of the cell. Areas not monitored via laser scanner or light curtain are closed off by the fences. Upon detection of a worker, laser scanner and light curtain send a stop signal to the robot. Note that due to the response time of the sensors, the stop signal is delayed. Furthermore, the robot needs a certain stopping time to reach a standstill. The cell is designed to satisfy the following safety condition: “Contact between human and robot must not be possible unless the robot stands still.” Thus fences, laser scanner, and light curtain are configured in a way that even with the sensor delay and the robot stopping time, the worker cannot reach the robot before it has stopped [22], [24]. Since these safety measures should avoid any contact between human and robot while the robot is moving, the set of unsafe states U in our example is defined as follows:
U = {s | vR > 0, dHR = 0} (8)
By altering the original cell layout and deliberately introducing safety-critical design flaws, we created three test scenarios where unsafe states are possible, each scenario containing a specific collision hazard. The scenarios are shown and explained in Figure 3 and Table 2.
Test Scenario | Safety Flaw | Resulting Hazard |
Scenario 1: Reduced width of laser scanner zone | The width of the laser scanner protective field is reduced. Although the worker can still be detected by the laser scanner, the reduced field is too small to ensure that the robot stops completely before the worker can reach it. | A collision is possible if the worker approaches the table at the point where the robot path is closest and leans into the path as the robot passes (see Figure 3, Scenario 1). |
Scenario 2: Altered robot path and position | Position and path of the robot are altered in such a way that the robot’s elbow joint protrudes into the maintenance bay. Due to the protruding elbow joint, the distance between the light curtain and the robot is not sufficient anymore to stop the robot in time. | A collision is possible when the worker enters the maintenance bay (Figure 3, Scenario 2). |
Scenario 3: Partly removed safety fence | A part of the safety fence is removed. While the table itself is still closed off by the fence, the edge of the laser scanner field is not. | A collision can occur when the worker leans over the laser scanner field to reach around the remaining part of the safety fence (see Figure 3, Scenario 3). |
Table 2: Description of proof-of-concept test scenarios
Note that the movement of the robot, the sensor delays, and the robot stopping time make the scenario dynamic. Although the dynamic effects here are relatively simple, they show that the method is able to find hazards in dynamic simulations and not only in static environments.
Test Runs
Setup: Test runs are performed in CoppeliaSim with simulation timesteps of 50 ms. Each human action has a duration of four timesteps and each episode consists of eight actions. Test runs are conducted from two different starting points, one on the upper end of the cell for scenario 1 and one on the lower end of the cell for scenario 2 and 3 (compare Figure 3 for the test scenarios and Figure 4 for examples of corresponding hazard situations). Although this may seem like a convenient simplification, it is justifiable from a practical perspective since a user would certainly select meaningful starting points and not place the human model at random. Each test scenario is performed with both MCTS1 and MCTS2. To show that our approach does indeed increase the chance of finding hazards, we conduct a random search for comparison in which episodes are assembled by sampling actions from a uniform distribution over A. For each combination of test scenario and algorithm, ten test runs are conducted with different random seeds. Each test run is limited to 200 episodes.
Results: Results are shown in Table 3. The first row shows the success rates, i.e., in how many of the test runs the hazard was found. If no hazard is found within 200 episodes, a test run is considered unsuccessful. The second row shows the runtime, that is, the average number of episodes until the discovery of the hazard (unsuccessful test runs are counted with a maximum of 200 episodes). It can be seen clearly that the two MCTS variants perform significantly better than the random search, both in terms of success rate and run time, which indicates that the adaptation of human behavior does indeed increase the chances of finding hazards. However, it can also be seen that hazards can be missed. This is not only the case for the random search but also for both MCTS algorithms (although much less frequently). Meanwhile, comparing the MCTS variants with each other shows no clear advantage for either of them, especially given the small number of test scenarios. More tests will be conducted in the future to investigate potential differences in performance.
Algorithm | Scenario | |||
1 | 2 | 3 | ||
Success rate | Random | 3/10 | 3/10 | 8/10 |
MCTS1 | 10/10 | 8/10 | 10/10 | |
MCTS2 | 10/10 | 9/10 | 10/10 | |
Avg. number of episodes | Random | 166.8 | 150.4 | 81.1 |
MCTS1 | 70.0 | 63.3 | 34.9 | |
MCTS2 | 49.7 | 80.4 | 38.0 |
Table 3: Results of the test runs
Discussion
As the proof of concept has shown, the method can identify hazards in a realistic, industry-like robot system. Compared to a random search, it finds hazards significantly quicker and with a higher success rate. However, being in a proof-of-concept phase, there are several limitations to its applicability, especially the simplistic human model. Furthermore, in its current implementation, the method can only find one hazard at a time. In a practical application, the user would have to eliminate the found hazard by updating safety measures and then repeat the search to find further hazards. While this avoids the problem of local minima (i.e., discovering the same hazard repeatedly), it is impractical. Another, more fundamental limitation comes from the fact that the method is based on falsification of safety conditions. This means it cannot give a safety guarantee, it can only find counterexamples of situations where safety conditions are violated. Thus, it should be seen as an addition to existing methods rather than a replacement.
The major advantage of the method is that it can find hazards autonomously while reducing the required amount of prior knowledge about the system to a minimum. Furthermore, it can be easily integrated into common robot simulator models which are widely used and do not require building a system model specifically for hazard analysis. These properties are highly desirable for the analysis of novel and complex systems. Since the proof-of-concept implementation is relatively simple, the full extent of these advantages may not be visible yet. However, we believe that there is great potential in this approach and that it could provide a powerful, scalable, and flexible tool for testing various types of complex robot systems, not only in the industrial context.
Future Work
Currently, the method’s limitations mainly result from simplifications in modeling and implementation. Especially the fact that we use a static octree for the arm workspace rather than an articulated arm model limits the types of hazards that can be identified. This will be addressed by augmenting the reachability model with an articulated arm model. Moreover, a collision force estimation will be incorporated. This will allow the method to test systems not only against velocity- and distance-based safety criteria but also against collision force limits. Another aim is to enable a search for multiple hazards in one run. This will require adaptations to the MCTS to avoid convergence in local minima. To enable a widespread practical application, it should also be investigated how the method can be implemented in other common robot simulators, for example, Visual Components, ProcessSimulate, etc.
Conclusion
A simulation-based method for safety testing of robot systems was proposed and evaluated. The method uses a human model and Monte Carlo Tree Search to find unsafe system states in simulation, which enables an automated hazard identification and reduces the reliance on prior system knowledge. A proof of concept has shown promising results, but the current implementation is still relatively simple and requires further development. Since the method is based on falsification of safety conditions, it cannot give a safety guarantee. Thus, it should be seen as an addition to existing methods rather than a replacement.
Acknowledgments
This work was funded by the Ministry of Economics, Work and Housing of the State of Baden-Württemberg in the research project “RoboShield.” We thank the authors of [3] for sharing AST on Github, and Gabriel Zerrer for his assistance with the simulation.
References
- R. Müller, M. Vette, and O. Mailahn, “Process-oriented task assignment for assembly processes with human-robot interaction,” Procedia CIRP, vol. 44, pp. 210–215, 2016.
- “ISO 10218-2:2011 Robots and robotic devices – Safety requirements for industrial robots – Part 2: Robot systems and integration,” International Organization for Standardization, 2011.
- R. Lee, O. J. Mengshoel, A. Saksena, R. Gardner, D. Genin, J. Silbermann, M. Owen, and M. J. Kochenderfer, “Adaptive stress testing: Finding failure events with reinforcement learning,” arXiv preprint arXiv:1811.02188, 2018. Available at https://github.com/sisl/AdaptiveStressTesting.jl.
- R. Lee, M. J. Kochenderfer, O. J. Mengshoel, G. P. Brat, and M. P. Owen, “Adaptive stress testing of airborne collision avoidance systems,” 2015 IEEE/AIAA 34th Digital Avionics Systems Conference (DASC). IEEE, 2015, pp. 6C2–1.
- M. Koren, S. Alsaif, R. Lee, and M. J. Kochenderfer, “Adaptive stress testing for autonomous vehicles,” 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018, pp. 1–7.
- “IEC 61882:2016: Hazard and operability studies (HAZOP studies) – application guide,” International Electrotechnical Commission, 2016.
- “IEC 60812:2018 failure modes and effects analysis (FMEA and FMECA),” International Electrotechnical Commission, 2018.
- N. G. Leveson, Engineering a safer world: Systems thinking applied to safety. The MIT Press, 2016.
- J. Guiochet, “Hazard analysis of human-robot interactions with HAZOP-UML,” Safety Science, vol. 84, pp. 225 – 237, 2016.
- J. A. Marvel, J. Falco, and I. Marstio, “Characterizing task-based human-robot collaboration safety in manufacturing,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 2, pp. 260–275, 2014.
- R. Awad, M. Fechter, and J. van Heerden, “Integrated risk assessment and safety consideration during design of HRC workplaces,” 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Sep. 2017, pp. 1–10.
- M. Askarpour, D. Mandrioli, M. Rossi, and F. Vicentini, “Safer-HRC: Safety analysis through formal verification in human-robot collaboration,” 35th International Conference SAFECOMP, 2016.
- M. Askarpour, D. Mandrioli, M. Rossi, and F. Vicentini, “Modeling operator behaviour in the safety analysis of collaborative robotic applications,” 36th International Conference SAFECOMP, 2017.
- M. Askarpour, D. Mandrioli, M. Rossi, and F. Vicentini, “A human-in-the-loop perspective for safety assessment in robotic applications,” 11th International Andrei P. Ershow Informatics Conference, 2017.
- F. Vicentini, M. Askarpour, M. G. Rossi, and D. Mandrioli, “Safety assessment of collaborative robotics through automated formal verification,” IEEE Transactions on Robotics, vol. 36, no. 1, pp. 42–61, 2019.
- D. Araiza-Illan, D. Western, A. Pipe, and K. Eder, “Model-based, coverage-driven verification and validation of code for robots in human-robot interactions,” arXiv preprint arXiv:1511.01354, 2015.
- D. Araiza-Illan, D. Western, A. G. Pipe, and K. Eder, “Systematic and realistic testing in simulation of control code for robots in collaborative human-robot interactions,” Annual Conference Towards Autonomous Robotic Systems. Springer, 2016, pp. 20–32.
- P. Bobka, T. Germann, J. K. Heyn, R. Gerbers, F. Dietrich, and K. Dröder, “Simulation platform to investigate safe operation of human-robot collaboration systems,” 6th CIRP Conference on Assembly Technologies and Systems (CATS), vol. 44, 2016, pp. 187 – 192.
- G. Juez Uriagereka, E. Amparan, C. Martinez, J. Martinez, A. Ibanez, M. Morelli, A. Radermacher, and H. Espinoza, “Design-time safety assessment of robotic systems using fault injection simulation in a model-driven approach,” 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C), 2019, pp. 577–586.
- E. Rohmer, S. P. Singh, and M. Freese, “V-rep: A versatile and scalable robot simulation framework,” 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2013, pp. 1321–1326.
- N. Klopcar and J. Lenarcic, “Kinematic model for determination of human arm reachable workspace,” Meccanica, vol. 40, pp. 203–219, 2005.
- “ISO 13855:2010 Safety of machinery – Positioning of safeguards with respect to the approach speeds of parts of the human body,” International Organization for Standardization, 2010.
- J. Medley, “Human anatomy fundamentals: Flexibility and joint limitations,” 2014,
https://design.tutsplus.com/articles/human-anatomy-fundamentals-flexibility-and-joint-limitations–vector-25401. - “ISO 13857:2019 Safety of machinery – Safety distances to prevent hazard zones being reached by upper and lower limbs,” International Organization for Standardization, 2010.