A Detailed Overview of Testing Requirements for Mainframes and Servers
This 2-part series of articles will focus on hardware compliance aspects of specific information technology electronics equipment which includes mainframes, server computers, and subcomponents. In Part 1 of this series, we will provide a technical overview of server components and subcomponents and discuss specifics regarding product safety regulations and testing.
Part 2 of this series will address additional areas of regulatory compliance, including electromagnetic compatibility and environmental concerns. We’ll also discuss how IT equipment is tested and certified to compliance standards for worldwide shipments.
The goal of this 2-part series is to provide our readers with a better understanding of the requirements for executing hardware compliance testing and certification, as well as the technical details of every compliance discipline.
Technical Overview of Server Computer and Subcomponents
Before diving into the details of each discipline of hardware compliance, it is important to understand the product being tested. This article focuses on the application of hardware compliance to information technology (IT) server computers and their subcomponents, such as processor drawers, input/output (I/O) drawers, cooling subsystems, cryptographic security cards, etc.
A maximally configured server computer with the front doors removed is shown in Figure 1. The mainframe is made up of many subcomponents that fall into one of the following three categories: 1) subcomponents that are designed and manufactured by the information technology (IT) company that will own the end‑product; 2) subcomponents designed in partnership with another company who owns the sub-component and that sells it to IT company that will own the end-product; or 3) completely off-the-shelf original equipment manufacturer (OEM) parts.
Figure 2 shows a breakdown of the subcomponents within a single rack air-cooled server.
The system in Figure 2 contains two processor drawers, three (IO) drawers, two one-rack unit (1U) servers that manage the built-in service network, and two Ethernet switches that support communications between subcomponents for the built-in service network. Each of these subcomponents contains anywhere from one to four power supply units (PSUs), which take single-phase input within a rated range of 200Vac to 240Vac RMS. In addition to the processors themselves, the processor drawers contain memory and I/O cards to communicate with either other processor drawers or I/O drawers.
The I/O drawers support multiple different I/O cards that support different communication protocols (e.g., ethernet, fiber optics, etc.) to communicate with the outside world. The server also contains four power distribution units (PDUs) that contain connectors where users can supply power to the system from their facility. The power provided to each of the four PDU inputs can be 200Vac to 240Vac single-phase, 380Vac to 415Vac three-phase wye, or 200Vac to 240Vac three-phase delta power. The PDUs convert the input to a 200Vac to 240Vac single-phase output which is then distributed via internal cables to the PSUs. The PDUs are redundant, meaning that the server can run on half the PDUs, ensuring that the system continues to run if a PDU fails, a cable is unplugged or fails, or a power feed is lost that supplies power to the PDUs.
An Overview of Regulatory Compliance
Typically, when describing regulatory hardware compliance, it is good to start with the result of hardware compliance work, that is, a compliance label with many certification marks as shown in Figure 3. Such compliance labels can be viewed as a “passport” that allows products to be sold around the world when they are determined to be compliant with local regulations. Just about any electronic product or its packaging includes one of these labels. Figure 3 shows an example of an IBM agency system-level compliance label with several of these marks.
Each of these mini graphic symbols (marks) indicates that the product has been tested and certified that it meets/complies with specific country requirements in the areas of product safety (e.g., doesn’t exceed the current rating of a power cord), electromagnetic compatibility (e.g., doesn’t interfere with nearby devices), and environmental compliance (reduction of hazardous materials). The scope of certifications around the world is partially determined by voltage rating or power consumption, and the agency marks that appear on a compliance label are not going to be universal for all the products. In addition, marks on the label often need to be changed as regulators in various countries and jurisdictions change laws.
Marks can also be displayed differently for each product. Some products list agency marks either on the packaging, supplied documents (manuals), or via the product software/firmware (i.e., smartphones, smaller electronic devices).
To legally display the marks shown on the compliance label, a product needs to successfully comply with specific regulations and standards. Most countries around the world regulate products for adherence to industry standards for product safety, electromagnetic compatibility (EMC), and environmental characteristics. Compliance testing laboratories perform tests required by regulatory agencies or industry standards as shown in Figure 4. Regulatory product certification for ship support requires that internal company testing and reports be submitted to external product safety and EMC agencies for full worldwide country certification. Some companies have the ability to self-certify while others use third-party companies for certification. For instance, for some companies the U.S. and the European Union (EU) allow parties to self-certify EMC compliance, which can help to significantly shorten the certification process.
The job of hardware compliance is to ensure that products meet regulations that allow them to be marketed and sold around the world, but also to uphold the company’s reputation, protect it from litigation, and avoid fines. Compliance also helps to protect product service and installation professionals and end users from injury or death. Further, it helps to improve quality, enhance customer satisfaction, reduce the cost of product damage during transit, and protect the environment.
Integrating Compliance Testing Into the Product Development Process
Compliance testing needs to be integrated into the development stage of a product. All compliance testing is the gateway for approval or certification before the products move onto the next phase of development and are eventually released to the market. While each company might approach compliance differently whether they have in-house compliance personnel or go to an independent third-party testing organization, they still must do compliance testing.
Many times, compliance personnel are involved in hardware development discussions because they need to determine what needs to be tested. Compliance technical leads define the test and/or certification milestones and communicate this not only to the development engineers but more importantly to executives for utilizing test metrics. The pre-compliance evaluations may include software simulations and early user hardware testing, all with the goal of identifying and eliminating issues and/or problems with the hardware prior to the final stages of testing.
Even with pre-evaluations, products still need to go through the final compliance stage prior to the release. This final stage is where the approvals and the certifications are obtained for the product so it can ship globally. The final stages of testing include minimum ship-level hardware testing (hardware being used by a potential client). Material declarations are obtained in this stage to ensure they need environmental compliance. Component vendor safety certifications are also obtained during this stage to figure out if there will be any issues prior to shipping. Once all the final stage compliance requirements are met, the product is allowed to go to market.
Evaluating hardware early in the design process and obtaining information on a given product is not only quite a bit of work but carries with it the cost associated with compliance testing. There is always a fine balance between the need to test, what to test, and when to test it. Ultimately, the decisions are defined by the compliance groups and developed based on experience and engineering judgment.
Another defining cost for compliance work is finding a balance between obtaining the minimum ship‑level hardware for testing while still securing the certifications and testing approvals needed to ship the hardware globally. Each compliance test has different requirements that drive the hardware needed for testing and its associated cost. Each of these aspects will be further defined in the rest of this article. Ultimately, EMC requires all configurations to be tested, while Product Safety testing and volatile organic compounds (VOC) emissions testing require worst-case maximum configuration. However, VOC emissions testing needs brand-new hardware, while Product Safety testing can use hardware that has run-time hours on it. There is always a balance in the corporate world between the cost of the hardware and scheduling all the hardware compliance testing required to meet ship support dates.
The Cost of Compliance
The cost of compliance can be broken down into two cost types: capital and expense. Capital costs include necessary tools for testing, such as test chambers, software for regulatory tracking, or databases for tracking all the compliance work. There is also the capital cost associated with headcount. All compliance work must have proper staffing to meet ship support dates. When there are a lot of systems being released and compliance testing such as EMC must test each configuration, there needs to be the right number of staff to get all of this done.
Other costs include costs associated with test equipment calibration, especially when the compliance testing laboratories are ISO/IEC 17025:2017 accredited. This also includes the cost of the accreditation and all the activities associated with it. Some companies’ expenses can also be attributed to external compliance testing. Each company must determine what schedule and cost work best for their product release cycle.
Aligning Product Release Plans and Compliance Efforts
One might wonder how the compliance test schedule works within the development cycle and is still completed prior to the product’s release to the market. Compliance engineers learn of scheduled product launch dates from their company. With this information in hand, engineers can establish a staged testing schedule to align with those dates. In some cases, product release schedules are staged geographically, allowing test engineers to conduct required testing in phases.
For these reasons, compliance engineers need to be familiar with the regulatory activity by geography and know what tests are required for those geographies. Effective scheduling also requires internal coordination of the compliance testing sequence. For example, certain tests, such as VOC testing, must be conducted with brand-new hardware, so they must be scheduled before any other testing that uses the same equipment.
Testing for Product Safety Compliance
Product safety hardware compliance testing is a crucial step in the development and manufacturing of any product that incorporates electronic components such as servers and subcomponents. Product safety testing ensures that products meet the required safety standards and regulations, protecting users from potential hazards and reducing the risk of liability for manufacturers.
Both servers and their subcomponents fall under the category of information and communication technology (ICT) equipment and are subject to meeting the requirements of multiple standards. As of the writing of this article, the product being certified (either the server or a subcomponent) typically needs to meet the requirements of IEC 60950-1, Information Technology Equipment – Safety – Part 1: General Requirements, as well as both the 2nd and 3rd Editions of IEC 62368-1, Audio/Video Information and Communication Technology Equipment – Part 1: Safety Requirements, with all amendments and country deviations. Compliance with more rigorous standards may be necessary to meet the requirements of a nationally recognized test laboratory (NRTL) such as UL, CSA, or TUV to facilitate regulatory approval in jurisdictions around the world. Even more rigorous testing requirements may be implemented by original equipment manufacturers (OEMs) to ensure the highest possible level of safety.
The specific tests required by each standard are similar, but testing limits can vary from standard to standard. To ensure compliance with all the standards, a superset of worst-case test limits is typically utilized for each test case; a server or subcomponent that meets the worst-case limits of all of the applicable standards is best positioned to meet the requirements in any regulatory jurisdiction.
Product safety compliance efforts begin during the earliest development stages of the server. Because of the potential costs and scheduling changes that can result from non-compliant hardware, it is incumbent on the product safety engineer to attend design meetings and to review both electrical and mechanical designs as early in the development process as possible, and provide feedback for design improvements that will ensure the final design passes all the requirements of the standards.
Some of the early work includes: 1) reviewing prints; 2) reviewing electrical schematics to ensure any power outputs are current-limited; 3) reviewing printed circuit board (PCB) layouts to ensure proper spacing of components (e.g., creepage and clearance distances); 4) reviewing 3D mechanical CAD models and/or early mechanically-good hardware for access to energized parts, hazardous moving parts, and sharp edges; 5) reviewing thermal simulation data to identify locations that may exceed touch temperature limits (potential burn hazards) or critical components that may exceed their operating limits and could result in smoke or fire, and 6) reviewing the overall grounding scheme of the server or subcomponent.
Product Configuration Considerations
There are many configurations offered for each server or server subcomponent. Customers can choose a desired I/O configuration, memory configuration, or processor configuration, as well as many other options that will result in different components being installed into their chosen customized system. Therefore, when hardware is available for safety tests, it is important that the correct configuration is selected for testing. Each test within the product safety standards requires that the product is tested in the most unfavorable scenario of normal use.
For server subcomponents, the safety engineer must consider the worst-case configuration for that subcomponent, which may not match the configuration for that same subcomponent when implemented within a fully configured server. The subcomponent may be over-tested (e.g., tested in a higher room ambient temperature, utilizing fan speeds that are suboptimal for each test, etc.) which provides some buffer against failure when that same subcomponent is installed in a server during system-level product safety testing. For server-level product safety testing, the maximum system configuration is selected for testing which includes the highest number of processor drawers, I/O drawers, PDUs, and server racks.
Types of Product Safety Testing
Some of the product safety tests that are required to be performed on a server or subcomponent include steady force, accessibility to electrical energy sources and safeguards, electric strength, capacitor discharge after disconnection of a connector, the resistance of the protective bonding system, prospective touch voltage, touch current measurements, sharp edge testing, accessibility to moving parts, stability testing, input testing, normal operating conditions temperature measurements, and simulated abnormal and fault conditions testing. The following sections provide details on each of those types of tests.
Steady Force Testing
Steady force testing requires the safety engineer to push on mechanical enclosures and barriers or parts mounted on a PCB with a specified force (e.g., between 10N and 250N, depending upon the location) to ensure that electrical insulation is not bridged or hazardous energy sources do not become accessible.
To assess accessibility to electrical energy sources and safeguards, the test engineer uses a test finger instrument and applies that to all user-accessible areas to determine if a part of a specific current, voltage, or power level can be touched. During this test, the engineer can remove any door, cover, or component that does not require a specialized tool to gain access. The same test finger instrument is used to evaluate accessibility to moving parts. Here, the test engineer determines if the instrument can access components such as a moving fan blade or pump motor.
Sharp Edge Testing
A sharp edge test determines if any sides, edges, or corners of the server or subcomponent are sharp enough to injure a user. A sharp-edge test instrument is used that applies a specific force to a tape head that is slid across the area of concern. Any location that cuts through two layers of tape on the tape head results in a test failure, and the manufacturing process must be reviewed and modified as necessary to ensure that the location is rounded or smoothed.
Stability testing requires the test engineer to place each rack of the server on a tilt table and then place the table at an angle of 10 degrees for one minute. Should the rack tip over on any side, it fails the test. The test engineer must ensure that the system is configured in the worst-case allowable configuration that may induce tilting (i.e., the most top-heavy configuration).
Capacitor discharge after disconnecting a connector requires the test engineer to measure the capacitance present at the input pins to the PDU (at the system level) or power supply unit (PSU) (at the component level) to ensure that the voltage reduces to a safe level within a given amount of time (e.g., 2 seconds).
Electric Strength Testing
Electric strength testing requires the test engineer to apply a high voltage (typically around 2500V) across parts for one minute to test the effectiveness of the insulation (e.g., air gap, FR4, etc.). If there is a sudden breakdown of the insulation material that allows current to flow between the parts, the insulation poses a shock hazard and is deemed to have failed.
Resistance of the Protective Bonding System Testing
Resistance of the protective bonding system requires the test engineer to apply a high current to the equipment, double the current rating of the minimum required upstream breaker (e.g., test current could be as high as 126A on a server) between the input connector ground pin of the PDU or PSU and the grounded chassis of each component, and measure the resistance across those two points. The voltage drop is then calculated and must not exceed 2.5V. This ensures that, if a fault were to happen in the system that put voltage on the grounded system chassis, a path exists for that current to reach the input power cord and make its way back to and flip the upstream breaker, such that a user would not be exposed to that voltage if they touched the server or subcomponent chassis.
Prospective Touch Voltage, Touch Current, and Protective Conductor Current Testing
Prospective touch voltage and touch current are measurements of the voltage and current that flow through the human body when a person touches the server or subcomponent and another ground location that may be present in the data center. These measurements are made under normal and fault conditions, but the worst-case measurement is typically obtained when a fault simulating the loss of the ground connector on the power cord is tested. In this scenario, any leakage current present on the chassis of the server or subcomponent now flows through a person. A touch current network is used to obtain a measurement that simulates the impedance of the human body. The touch current and touch voltage obtained must be below a level specified in the standard to be deemed a passing result.
Figure 5 shows an image of a safety engineer performing a touch current measurement.
Input testing ensures that the server or subcomponent does not exceed its rated input current. For this test, the system power is maximized. The maximum configuration is tested with the highest power I/O cards and memory DIMMs installed, and the cooling fans and pumps are set to their highest supported speeds. The system or subcomponent is then tested in the highest supported ambient temperature (e.g., 40°C), and an exerciser is executed on the system that simulates the high end of a customer workload.
The system can also operate in a condition known as N-mode. The power subsystem is designed for full redundancy, meaning that there are twice as many PDUs and PSUs as required such that if a failure happens in the field, the system will continue to run. N-mode is the minimum number of PDUs or PSUs required before functionality is lost and the system or subcomponent goes down.
Input measurements are made under normal and N-mode conditions at the ends of the rated voltage ranges, common voltages used in specific countries around the world, and tolerance voltages 10% above and 10% below the rated voltage ranges. Worst-case measurements are obtained during N-mode testing because the total power required to run the server or subcomponent is divided between a smaller number of PDUs or PSUs. The testing ensures that the measured current at all these voltage and configuration permutations does not exceed the input rating of the server or subcomponent.
Normal Operating Conditions Temperature Testing
Measurements of temperatures under normal operating conditions are made to ensure that components do not exceed their operational limits, which could result in smoke or fire incidents, and that locations that can be touched do not exceed temperature limits that could cause burns or cause a user to hurt or shock themselves due to an involuntary action where they pull their hand away from a hot location.
To perform this testing, thermocouples are attached to safety-critical components and common touch locations. The maximum configuration is then tested with the highest power I/O cards and memory DIMMs installed, the system or subcomponent is tested in the highest supported ambient temperature (e.g., 40C), the cooling fans and pumps are set to perform as they normally would in the current ambient condition and may be set to an even lower speed for subcomponents to provide buffer when that subcomponent is tested at the system level, and an exerciser is executed on the system that simulates the high end of a customer workload. Tests are executed at multiple voltage setpoints, including at the ends of the rated voltage ranges, common voltages used in specific countries around the world, and at tolerance voltages 10% above and 10% below the rated voltage ranges. Each test lasts a minimum of 1 hour or until temperatures on all thermocouples reach equilibrium.
Abnormal and Fault Conditions Testing
Simulated abnormal and fault conditions testing is very similar to normal operating conditions temperature testing except that a single abnormal or fault is introduced and tested one at a time. Abnormal and fault conditions include fan failure, power supply failure, blocked ventilation, loss of cooling water flow, pressure testing, and reverse polarity testing.
An example of a blocked ventilation test is shown in Figure 6.
In addition to monitoring temperatures of all the thermocouple locations, the system or subcomponent behavior is monitored to record changes to fan speeds, a shutdown of components or the system, pressure increases, increases in current or power, etc. This testing determines if a safety concern could arise if there is a single fault, so it is acceptable for the system or subcomponent to shut down during testing if it shuts down in a way that does not introduce a safety concern.
In conclusion, product safety testing is a critical aspect of hardware compliance that evaluates a product’s electrical components to ensure they meet safety standards and regulations. By conducting product safety testing, manufacturers can ensure that their products are safe for use and that they meet the applicable safety standards. It also protects against foreseeable misuse, helping to ensure the safety of clients, support engineers, and anyone exposed to a product.
In this article, we’ve provided a technical overview of server components and subcomponents, the process of integrating compliance testing into the product development process, and details regarding the various types of product safety testing. In Part 2 of this series, we’ll address additional areas of regulatory compliance, including electromagnetic compatibility and environmental concerns. We’ll also discuss how IT equipment is tested and certified to compliance standards for worldwide shipments.
John Werner is a Senior Electromagnetic Compatibility/Product Safety Design Engineer with IBM in Poughkeepsie, NY.
Rebecca Morones is a Senior Product Environment Stewardship Engineer with IBM in Denver, CO.
Arkadiy Tsfasman is a Senior Technical Staff Member and Hardware Compliance Technical Architect with IBM in Poughkeepsie, NY.