Reducing Risk Through Strategic Project Delivery
Lessons from a Live Data Center
August 9, 2021
Like any facility, the management of maintenance, improvements and new projects within a data center is critical for its continued operational success. Unlike most other assets, data centers do not have the benefit of inherent downtime. Remaining online to deliver critical connectivity and service to customers is the primary focus of any operational strategy. These projects, delivered in a “live” data center environment, introduce critical risk factors that can make or break your site’s reliability:
- Resilience risks
- Redundancy risks
- Service Level Agreements (SLA) compliance risks
- Site constraint risks
As the global leader in data center operations, CBRE’s Data Center Projects practice takes a holistic view of mitigating these risks while managing to your organization’s facilities goals.
In a data center, resilience means how well the site can handle an equipment failure, impacting service delivery at varying degrees. To mitigate these risks, project teams should:
- Become aware of the existing resilience risks within the specified data center. As most upgrade projects are performed on legacy or aged data centers, engagement with the operational team is paramount to fully understand what effect the project will have on the data center.
- Review maintenance reports of all main mechanical and electrical infrastructure and ensure that a detailed overview is completed on-site to become fully aware of any inherent issues or resilience risks with the data center.
- In conjunction with the operations team and the client representative, complete a resilience register, which can then be incorporated in the project execution plan. Update the register throughout the course of the project as it will inform the basis of the project approach in relation to the phased delivery of the project.
- Set up weekly whiteboard sessions with the operations team to review the four week look ahead for the project in consultation with the resilience register is required to ensure that all stakeholders are aware of the risks at each phase of the project. Once the operations team are fully aware of the planed phasing of the project works, they can ensure that steps taken will reduce the effect of these resilience risks on the data center operations.
Redundancy is a practice and concept that establishes which systems should include either a component or full-system backup should the system fails, requires maintenance or needs upgrades. Redundancy is typically required for power supplies and cooling components as they are crucial for maintaining system health, accessibility and reliability. Most data centers are designed to align with various industry standard “tier” levels requiring certain levels of redundancy. The following are ways to best account for redundancy concerns:
- Work with key stakeholders including the operations team and client representative to complete a detailed review of the project and the planned or potential effects that it could have on the redundancy levels of the data center. We suggest using a Gantt Chart that aligns to the project program incorporating the levels of redundancy available at every stage of the project. This Gantt Chart will ensure all parties are fully aware of the phased steps associated to the redundancy available at the data center.
- Review the Gantt Chart with all key stakeholders to make sure they are happy with the levels of redundancy at each step or whether additional temporary equipment that provides redundancy capacity is required throughout the project duration.
- Ancillary options, such as temporary capacity roll up solutions, must be reviewed and agreed to ensure that clients on the operations team are comfortable with the levels of capacity resilience and redundancy available to the data center throughout the project duration.
Service Level Agreement Compliance Risks
A data center SLA is a service level agreement covers all the key infrastructure elements and service metrics like power, temperature and network availability. Standard SLAs include:
- Power: an uptime commitment for redundant power (A+B power)
- Temperature: a commitment to maintain an average ambient room temperature
- Humidity: a commitment to maintain an average humidity within the data hall
- Bandwidth: a guaranteed uptime on network availability (offered by data center managers/owners to clients)
The purpose of these SLAs is to ensure that the data center environment is kept at optimum levels to support the servers and computer equipment housed within them. In most cases if an SLA is breached, and the environment moves outside of the agreed parameters, the penalty or commercial credit can be applied to the data center owner. The following items should be reviewed to ensure the SLAs continue to run smoothly:
- Before starting a project, the project delivery team in conjunction with the client and the operations team should complete a detailed review of all SLAs associated with the data center. We recommend create a cause and effect matrix for each potential SLA affected throughout the course of the project. Then, add the matrix to the project execution plan and continue to update the matrix throughout the course of the project.
- Next, run the matrix by the client to confirm whether the SLAs can be postponed for a duration period within the project or whether additional temporary resources are required.
Site Constraint Risks
Data centers are normally aligned to industry standards for secure environments, which bring significant challenges when delivering a project when deliveries and off-site personnel may be required to complete a task. Follow these steps to ensure site protocols are still maintained while executing the project:
Effectively, everyone entering the site should be aware of the key details associated with the specific data center. Therefore, at the pre-commencement stage the project delivery team needs to engage with the client, operations and security representatives to complete a detailed review of site constraints. This meeting should be a detailed review of the following elements:
- Access requirements for personnel to the facility
- Specific access requirements to restricted areas within the data center
- Site rules for personnel entering the facility
- Site access requirements for project deliveries; restriction periods and durations such as change freeze. Data center change freezes are often vital for delivering service at key moments for supported businesses – e.g. e-commerce businesses typically avoid major project work during the holiday shopping season.
- Planned preventative maintenance planner
- Incorporate the details from the meeting into the project induction for all personnel entering the site to execute works. This induction should provide all personnel with the rules, expectations and methods to perform their tasks within the agreed parameters of the live environment.
In data centers, additional expertise and rigor around risk mitigation are required to deliver a successful, holistic approach to risk management throughout the project delivery process. At CBRE, our data center project teams evaluate a client’s specific business model and operational platform to create project strategies that ensure successful delivery while mitigating risks and maximizing efficiencies.
The Global Leader in Data Center Operations
Head of Projects, Data Centre Solutions, CBRE, Global Workplace Solutions