Most HA architectures are engineered to survive unexpected failures. They are not engineered for the maintenance window your team schedules every month — and that gap is where production goes dark.
The Guest: Dave Bermingham, Senior Technical Evangelist at SIOS Technology
The Bottom Line: • HA architectures are designed around the question “what if the server crashes?” — not “what if we intentionally take a node offline to patch it?” That design gap is the leading cause of enterprise outages, and it creates a no-win choice: delay patching and accumulate security debt, or rush through it and hope nothing breaks
Speaking with TFiR, Dave Bermingham of SIOS Technology defined the current state of high availability patching — and why the anxiety most IT teams feel during maintenance windows is a direct symptom of an architectural problem that most HA solutions leave unaddressed.
WHAT IS THE HA PATCHING PROBLEM?
High availability architecture is built to answer one question: what happens when something fails unexpectedly? It is not built to answer a different but equally critical question: what happens when the team intentionally takes a node or application offline to apply a patch? Most enterprise HA deployments never close that gap — and every patch cycle exposes it.
“Patching is risky because you’re intentionally changing something in a system that people rely on to be stable. Most outages actually occur during maintenance, not during random failures. When you apply a patch, you’re introducing new code, new drivers, and possibly triggering a reboot. Sometimes these things interact with applications in ways that nobody expected.”
The result is a structural problem that looks like a people problem. IT teams approach patch day with anxiety, and that anxiety produces one of two outcomes: they delay patching — which compounds security risk — or they rush through it and accept the possibility that something will break. Neither outcome is acceptable for organizations running business-critical workloads.
“If the architecture does not include a clear maintenance workflow, you end up with a lot of anxiety around patching. People either delay it, which creates security risks, or they rush through it and hope nothing breaks. A good HA design should make patching feel routine rather than stressful.”
Broader Context: What SIOS Technology Does Differently
In the TFiR interview, Bermingham goes deeper on why not all HA solutions solve this problem equally. Hypervisor-level solutions — VMware HA, Hyper-V clustering — protect against physical host failures but still require the workload inside the VM to go offline for OS and application patching. Scheduled downtime remains unavoidable at that layer.
Application-level clustering, the approach SIOS Technology is built on, solves this at a different layer. Because the application runs across multiple nodes simultaneously, teams can patch one node at a time using a standby-node-first workflow: patch the standby, validate it, fail the workload over, then patch the original node. From the user’s perspective, the application stays available throughout. Interruption is typically only a few seconds during the switchover.
Bermingham also covers configuration drift — the incremental divergence between nodes that creates silent failover failures — and makes the case that a documented, rehearsed patching playbook is the single highest-ROI improvement most IT teams can make before their next maintenance window.
Watch the TFiR interview with Dave Bermingham here





