✔️Checklists: Doing the Work Right in 🔌Power Stations, 🏥Hospitals, 🧑✈️Airlines & 📡Data Centres
Checklists
Before we dive into how to manage complexity in entities such as power stations and data centres, let’s take a step back and discuss the importance of checklists in such environments.
Pacific Gas and Electric Company has a great video about using checklists and references the medical profession, aviation and nuclear power stations. Here is the video:
Borrowing from the theme of the video we will extend this to the use of checklists in a data centre as well as applying the Internet of Things (IoT) to the task such as those available from RealWear, Inc. and Xpitec (Pty) Ltd.
A History of Why Checklists Matter
A checklist is used to compensate for the weaknesses of human memory to help ensure consistency and completeness in carrying out tasks. Checklists came into prominence with pilots with the pilot’s checklist first being used and developed in 1934 when a serious accident hampered the adoption into the armed forces of a new aircraft (the predecessor to the famous Flying Fortress). The pilots sat down and put their heads together. What was needed was some way of making sure that everything was done; that nothing was overlooked.
The result was a pilot’s checklist. Four checklists were developed — take-off, flight, before landing, and after landing. The new aircraft was not “too much aeroplane for one man to fly”, it was simply too complex for any one man’s memory. These checklists for the pilot and co-pilot made sure that nothing was forgotten.
Orginal Boeing checklist for the Flying Fortress prototype
Data Centre Complexity
Data centres are complex entities. They are too much technology for one man to operate. They consist of multiple functions that combine that allows a data centre to operate and provide services. These functions are:
The data centre white space where the Information Technology kit is located. The white space often consists of a raised floor which play an important part in the cooling function described below.
The power supplied which consists of utility power, standby generators, and uninterruptible power supplies (UPS). The standby generators require fuel management and the UPS need battery strings. The internal distribution of power is typically handled by power distribution units (PDU) connected to a busway system.
Cooling where the primary function is to provide thermal management to the white space. The cooling systems consume power and the rate that this power is consumed is one of the important efficiency metrics for a data centre. The remaining power which is used in the white space is often known as the critical compute load.
The actual data centre site location and building shell. Mostly building components but an important consideration is the physical security of the whole data centre which includes access and surveillance of the site.
Fire detection and suppression systems. The suppression systems are invariably based on gas. Important considerations are also that walls and doors and various areas of the data centre are segmented and that the walls and doors between these areas are fire proof.
Data centre are permanently connected to the Internet and as such telecommunications and data centre networks are integral functions. The components include data centre networks (often using a spine and leaf architecture), telco meet-me-room, conduits and access man-holes.
All these functions need regular maintenance and upkeep and one of the operational tools is to use checklists in the same manner as shown above for medicine, aviation and nuclear.
Basic Data Centre Checklist
The following is a rudimentary checklist example associated with power.
The rating and weight are typically based on a scale from 1 to 5 and a score is thus achieved for the function. This score is then evaluated and categorized as follows:
Satisfactory : Components evaluated as adequate, appropriate and effective to provide reasonable assurance that data centre risks are being managed.
Low priority: A few specific components and weaknesses was noted. Components evaluated as adequate, appropriate and effective to provide reasonable assurance that data centre risks are being managed.
Moderate: Numerous weaknesses were noted. Components evaluated are unlikely to provide assurance that data centre risks are being managed.
Critical: Components evaluated are inadequate, inappropriate or ineffective to provide reasonable assurance that data centre risks are being managed.
Integrating IoT into the Data Centre
Each of the data centre components can be monitored by IoT sensors. These IoT sensors will provide metrics which are used by an IoT platform to determine that the data centre is operating within suitable parameters. This data becomes the core input to be used by the executed data centre checklists which need to be completed daily.
Many data centres rely on legacy SCADA systems for metrics and there is a definite requirement to refresh these systems to the next generation IoT based devices and platforms.
The Adoption of Wearables
At present many data centres are reliant on checklists being executed by data centre engineers using manual clipboards. Additionally, metrics are most often visualized and presented in Network Operation Centres (NOC) or Security Operation Centres (SOC).
Personally, as a professional working in data centre environments, I have spent countless hours completing assessments and checklists in a clumsy manner using Excel spreadsheets on a laptop. Besides not being automated, it is a difficult if not near impossible task to aggregate the collected data over time and construct high level reports.
The introduction of IoT based wearables allows data centre engineers to engage in operations hands free while being presented with information and metrics in a heads-up display. These wearables allow:
Display and executing the checklist items via an integrated application in the heads-up display;
Recording of data centre components using video or pictures;
Oversight of engineers by skilled and experienced managers who can remote engage and view operations using either Zoom or Teams;
Access data centre equipment manuals via voice commands for display in the heads-up display; and
Direct viewing in the heads-up display of the component’s metrics obtained from the IoT platform.
Improved Operations
The impact of using IoT in a data centre as described is that when a checklist is executed and the result is evaluated, corrective action can be immediately triggered. This allows any human error to be minimized and the result is improved data centre operations which can be measured via improved availability statistics.
Generic deployments
Although the specific checklist use case in this article is data centres, they can be used in hospitals, planes and even in power stations. Combined with correct standard operating procedures and handling of major incidents any crisis can be addressed even the current loadshedding one in South Africa.
Ronald ensures that Internet inhabiting things are connected reliably online at Fusion Broadband South Africa - the leading specialized SD-WAN provider in South Africa.
This article was previously published on IoT For All.
Subscribe to my newsletter
Read articles from Ronald Bartels directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Ronald Bartels
Ronald Bartels
Driving SD-WAN Adoption in South Africa