Cloud Native ComputingDevelopersFeaturedLet's TalkSecuritySIOS

High Availability For Building Maintenance And Security | Harry Aujla, SIOS Technology


Guest: Harry Aujla (LinkedIn)
Company: SIOS Technology Corp. (Twitter)
Show: Let’s Talk
Keyword: Building Management System (BMS)

All modern buildings and campuses incorporate some type of building management system (BMS) solution that controls or monitors the mechanical, electrical, and IT systems throughout the building. These could include things like ventilation and light, power, fire suppression systems, CCTV, and elevators.

Today BMS solutions have become software based, running on hardware that is designed and built with varying degrees of autonomy and intelligence. These solutions can either be hosted on site, or they could be hosted offsite at a geographically distant control center.

Ensuring the high availability of these systems is essential as it has a direct impact on the people living and working in the buildings. Loss of control of the systems could lead to dangerous situations and potential loss of life.

“When designing and planning BMS solutions, it’s important to ensure that high availability of these systems is also accounted for. If access to these systems is lost, there’s a very high risk of a loss of control to all of the building systems, as well as the associated probability of highly dangerous situations occurring and potential loss of life,” reminds Harry Aujla, EMEA Technical Director at SIOS Technology.

SIOS Technology creates software solutions that provide IT resilience for critical applications which businesses rely on. The applications are built to automatically recover from infrastructure and application failures within minutes without any loss of data.

SIOS Technology ensures that all the critical systems integrate at the right level and at the right time, maximizing systems resilience to minimize unplanned downtime of BMS solutions.

The BMS solution landscape has evolved greatly in recent times with increasing reliance on IT and the adoption of virtualization. Security and high availability of customers’ BMS data play an essential part in ensuring BMS solutions are kept safe in cloud platforms. With servers and storage no longer always on-site, BMS customers have had to embrace the cloud and the changing landscape as a whole. Aujla believes that customer confidence around cloud platforms is growing.

SIOS Technology approaches these challenges in a variety of ways, such as working with the customer to define the service level agreement or SLA, and how much downtime can occur until there is a material and detrimental effect on continuous business operations.

As Aujla explains, building a fault tolerant availability solution compared to a high availability solution will have substantial cost implications so finding a balance between these considerations is fundamental to the success of the BMS solution as well as testing the BMS applications as close to the real-world scenario.

About Harry Aujla: With over 20 years of experience in the IT business continuity sector, he adds a wealth of expertise for enterprises who are looking to build IT continuity strategies across their businesses. Consulting across many vertical sectors such as manufacturing, transportation, government, and more, he has been privileged to represent vendors who specialize in high availability, disaster recovery, and fault tolerant computing techniques. During his career, Harry has spent time in various roles such as technical consultant, salesperson, and technical trainer.

About SIOS Technology: SIOS Technology high availability and disaster recovery solutions ensure availability and eliminate data loss for critical Windows and Linux applications operating across physical, virtual, cloud, and hybrid cloud environments. SIOS clustering software is essential for any IT infrastructure with applications requiring a high degree of resiliency, ensuring uptime without sacrificing performance or data – protecting businesses from local failures and regional outages, planned and unplanned. Founded in 1999, SIOS Technology Corp. is headquartered in San Mateo, California, with offices worldwide.

The summary of the show is written by Emily Nicholls.

Here is the full unedited transcript of the show:

  • Swapnil Bhartiya: Hi this is your host Swapnil Bhartiya and welcome to another episode of Let’s Talk. So today we have with this, once again, Harry Aujla, EMEA Technical Director at SIOS Technology. Harry is great to have you back on the show.

Harry Aujla: Great to be here as well. Thanks.

  • Swapnil Bhartiya: Today’s topic is high availability, or HA for building maintenance and security. What exactly do you mean by building maintenance and security?

Harry Aujla: All modern buildings and campuses today will incorporate some type of buildings management system or BMS solution that facilitates the control and monitoring or various mechanical electrical, and IT systems throughout the building. And these will typically include things like ventilation and lighting, power, fire suppression systems, security, like CCTV, and it even extends towards things like managing elevators, and managing physical access control at turns styles and other entry points. And anything of a med mechanical or electrical nature is fed back into this wider BMS solution. So what the BMS solution becomes is a software based solution, running on hardware that is designed and built with varying degrees of autonomy and intelligence. That can either be hosted on site, or it could be hosted offsite at a geographically distant control center, for example. And in addition to managing and monitoring these individual systems installed throughout the building and it’s perimeter, it’s also going to consider how these systems need to integrate with each other as well.

So let’s think of an example. Let’s imagine that a fire breaks out in a building. So the first thing that’s going to happen is that the fire suppression system is going to sharp down the ventilation systems and that’s going to help prevent the spread of smoke and flames, and in some cases it might also initiate the startup of smoke evacuation fans as well. Then it’s also going to shut down the elevators as well, because we want to make sure that they get held on the ground floor to prevent people from using them. And it’s also going to ensure that the access control system is allowing people to safely evacuate the building. So the aggregation point for the command and control of these seemingly disparate systems to ensure that all the integrations are occurring at the right level and at the right time, this is what the BMS solution is.

So as you can appreciate, the individual systems and the wider BMS solution are by definition, mission critical in nature, and will have a direct impact on people living and working within the building. So the best practice when speaking with many of these BMS integrators and BMS customers that we work with, is that when designing and planning BMS solutions, it’s important to ensure that high availability of these systems is also accounted for as well. If access to these systems is lost, there’s a very high risk of a loss of control to all of the building systems, as well with the associated probability of highly dangerous situations occurring and potential loss of life.

So, it’s clearly evident that, maximizing systems resilience to minimize unplanned downtime of BMS solutions should be a foundational element of the design and deployment.

  • Swapnil Bhartiya: So as you explain it’s obvious now why building maintenance and security is an area that really needs high availability. Can you also talk about why BMS is a good model for other critical applications when it comes to high availability and disaster recovery?

Harry Aujla: Do you consider BMS as a distinct software industry sector? It’s traditionally extremely conservative in its approach to the adoption of new technology. It’s a very prescriptive and rules oriented sector. There’s also a lot of volume for regulatory and compliance requirements that have to be considered given the nature of the systems they integrate with.

It’s also a sector that’s become increasingly reliant on IT. So in the early days you had many BMS customers who came from a non IT background. So when the evolution from physical servers to virtual servers occurred, there was a significant inertia related to the adoption of virtualization. Today, virtualization is the standard go to platform for many BMS solutions. But the sector, in common with many other sectors, is now at the cusp of another technical evolution. Your customers are now looking at how the cloud is changing the operating landscape and the potential impact and benefits that can be leveraged of the adoption of this new deployment paradigm.

But again, given the nature of their business, there’s this understandable caution as to how they approach the cloud. Particularly in terms of aspects such as security and high availability. The market is now sufficiently mature in that many of the cloud vendors now offer secure and redundant connections into their platforms. So there’s an implicit trust that customers BMS related data is being securely transmitted to and from the cloud. And so, that security check box can, potentially be checked.

However, it’s a different case for high availability. Traditionally BMS solutions were developed and evolved in the physical world, meaning that you could see and touch and feel the platforms where your BMS solutions were running. There’s that element of tangibility involved knowing that your platforms are hosted on site.

Approaching the cloud world is quite different, as your servers and your storage are out there in the ether, where the physical location of your infrastructure is somewhat unknown. Cloud is very much a connection based infrastructure rather than a physical one. So it presents a significant leap of faith that depend… that needs to happen for BMS customers to embrace the cloud. And once BMS customers fully understand how high availability works on the cloud, then I think we’ll see a more significant migration of BMS solutions to cloud platforms.

Although here at SIOS, one of our aims is to help educate and enable BMS customers that the high availability SLAs that there is achieving on premise are also achievable on the cloud with the right technology set. And what’s really interesting swap is that there’s already signs of this happening right now. There’s an increasing number of BMS vendors that now offer cloud based solutions and customer confidence around cloud platforms is also growing. When you see the rapid growth and adoption at the consumer level with things like video doorbells, cameras, and cloud connected home alarm systems, and consider the interconnectivity of these systems with each other, the wider sort of internet of things story is certainly beginning to pervade into the commercial B2B BMS sector.

So watching and monitoring what the BMS sector does with respect to availability and specifically availability in the cloud, can become a bellwether for other mission critical applications that may be considered as candidates for cloud migration.

  • Swapnil Bhartiya: Now if I ask you, what do BMS companies do for high availability and disaster recovery that other IT shops should do?

Harry Aujla: Yeah, that’s another great question. There’s a couple of things which should be considered. And first and foremost, I would say define your SLAs. Regardless of the industry sector, before customers embark on our high availability project, they should first define the business problems or challenges that they’re seeking to solve and the desired outcomes. So this will generally distill into an availability service level agreement, or SLA, that defines how much downtime can be endured until it has a material and detrimental effect on continuous business operations.

So the SLA is typically measured or expressed in the terms of a number of minutes or hours per year. And the basic premise is that the more critical the operation, the less downtime that you can endure and the less downtime you can endure, the higher the availability SLA becomes. So SLAs are presented as percentages. So if we say a BMS application is 99.9% available, this is described as three nines and equates to accepting or expecting approximately, I would say, about nine hours of unplanned downtime in a year. 99.99% or four nines equates to approximately 50 minutes of unplanned downtime per year. And this is what we generally describe as achieving high availability. And then lastly, 99.999%, or five nines of availability equates to approximately five minutes of downtime per year. And this is generally described as achieving full tolerance.

But what needs to be considered here, however, is that the higher up the availability stack you go, the higher the price points of these solutions become. So for example, if we’re building a full tolerant availability solution for our BMS application… this is going to sound great on paper, but in reality, it’s going to be quite a costly exercise. Full tolerant availability solutions are extremely expensive, and will typically also result in solution or vendor lock in due to the proprietary nature of the components and the systems required for the operation of these fault tolerant solutions… which then have an effect on the furthering and furthering the total cost of ownership as well.

So to mitigate this challenge, what many BMS customers do is they take a more pragmatic approach in that they will review the business requirements versus the avoidance of any unnecessary costs. And often they end up adopting availability solutions that provide a four nines SLA. So these solutions are still going to provide you with an adequate level of high availability protection, but importantly, they can be offered at a more realistic and palatable price points with associated reductions in acquisition costs, deployment, complexity, and ongoing management and running costs.

So in summary, it’s important to sort of weigh up the balance between the required SLA and overall total cost of ownership of the solution. And then beyond that, I would also say you’d want to ensure the success of your failover operations in a failover scenario. So due to the nature of the systems that a BMS solution is monitoring and running, it’s vital to ensure that the application fail over and fail back mechanisms are proven, and dependable, with a high likelihood that they’re going to work as expected, should the worst happen.

So the last thing we want to deal with in the event of a BMS application failure, is a failure of the actual failover mechanism as well. So it’s going to leave you with significant disruption to business operations and it carries an exceptional risk to business and organizational reputation. So the best practice in the BMS sector, and probably indeed any sector in general, is to develop business continuity plan or a playbook that includes the regular testing of the failover functions on a regular basis. And for some organizations, this could very well be a regulatory requirement.

So it’s important that this testing is as close as possible to the real world scenario. When systems are unreliable or overly complex, IT teams may take shortcuts in their testing. So BMS users should look towards high availability solutions that allow for straightforward and automated failover and fail back testing without the need for manual intervention or sort of getting hands on keyboards. And that’s a strategy that I think sort of many, many IT shops would do well to be considering as well.

And then the last thing I would say there is, I would also think about planned maintenance periods as well and how these are managed. We can sort of leverage high availability platforms to manage and minimize planned downtime for aspects such as sort of things like software patching, and upgrades. We all recognize that the organizational overhead associated with the maintenance of software solutions is becoming more important and more frequent, to ensure the efficient running of an organization’s IT systems. If you combine the ever present threat of things like viruses, malware, and ransomware, and other potential security breaches, with the associated reputational damage and significant remediation costs, these can present an existential threat to a business.

And whether this is a BMS application, any associated databases, or even the underlying operating system… Using a high availability cluster allows for the idle cluster node to be patched and tested first, without interrupting the production activities on the active node. If the patch works as expected, a controlled failover to the idle node can then be executed, which now becomes active. And then the same changes are then applied to the previously active node, which is now idle and upon successful patching, the application can then fail back to the original node and this process can be achieved with minimal downtime when executed and managed in the correct order.

  • Swapnil Bhartiya: You also mentioned that a lot of BMS companies are running in the cloud as well. Are there any lessons to be learned for companies who are running their applications in the cloud?

Harry Aujla: Yeah, absolutely. I mean, if you think about it in a general term, IT environments, they’ll vary from customer to customer. Some have sort of large complex physical estates encompassing heterogeneous OSS applications and databases. And, like you say, there’s others who have been through a digital transformation, from a physical world to a virtual world and then to a cloud environment.

So regardless of what the environment actually looks like, whether it’s physical, virtual cloud… Choosing a high availability solution, that’s independent of the underlying platform, independent of specific operating systems, applications, or databases… This is going to be extremely beneficial for you, because it allows for a level of standardization of the deployment, an operation of all the high availability needs across the entire infrastructure. Now and the same can be said when you’re moving to the cloud, as it’s possible to maintain a consistent high availability experience after moving on premise high availability solutions, along with their associated operating systems, applications and databases. Further reducing the migration costs, sorry, further reducing the migration project costs and ongoing total cost of ownerships associated around that.

For customers whom are embracing the cloud, the other thing I would add is take a closer look at the high availability SLAs that some of the cloud vendors are offering. All of the cloud vendors today will offer high availability features are varying usefulness and performance around their core platforms. But these do require closer scrutiny. When cloud vendors talk about high availability, they’re generally referring to the underlying cloud components. Things like the virtual machine instances, the storage hardware and the underlying networking, but what they don’t typically address is the high availability requirements inside the cloud instances. So for example, if we have an instance running in the cloud where our BMS solution is running, and this instance for whatever reason happens to fail… the cloud vendors will acknowledge this failure of the instance and take the necessary actions to recover the instance and get the application up and running again.

So this works really well at an instance level, but what happens if you suffer an application software issue within the cloud instance? The cloud monitoring tools are not going to detect this type of failure. So they’re not going to take any action to help recover the application. And this becomes doubly important because, in addition to high availability, we also need to consider application performance as well. If you are running a BMS solution that say, for example, works in a stateful manner, you’ll want to maintain good application performance. And this is another reason why the cloud availability SLAs are somewhat insufficient for the business continuity needs of building management solutions.

So given the mission criticality of the BMS solution, as we’ve discussed previously and taken into account application performance requirements, we need a way of monitoring application level failures and orchestrating their recovery. So what needs to be considered is a high availability clustering solution, like SIOS that can address the application level high availability needs, which can then contribute towards maintaining application performance.

  • Swapnil Bhartiya: Harry, once again, thank you so much for taking time out today and talk about your BMS solutions and how of course, everything is evolving in this cloud centric, software IT centric world. Thanks for your insights. And as usual, I’d love to have you back on the show. Thank you.

Harry Aujla: Thanks. Well it’s been a pleasure.