Guest: Philip Merry
Company: SIOS Technology
Show: Data Driven
Topic: Cloud Native

Enterprise cloud migrations are surging, but there’s a dangerous assumption lurking beneath the surface. IT teams often believe that moving workloads to AWS, Azure, or other major cloud providers automatically makes their applications highly available. After all, these platforms promise resilient infrastructure, right?

Not quite. And that misunderstanding is costing companies millions in unplanned downtime.

Philip Merry, Solutions Engineer at SIOS Technology, draws a critical distinction that every enterprise architect needs to understand: cloud providers deliver infrastructure resilience, but application availability remains squarely in your court.

“The systems, the virtual machines, the infrastructure that’s hosted in the cloud is highly available. It’s managed by the cloud,” Merry explains. “However, you’re ultimately responsible for your applications and the applications that run upon your services in the cloud.”

The crux of the issue? Cloud providers ensure that your virtual machines, storage devices, and networking components remain operational. But if your database crashes due to an application-level failure, or a disk fills up and chokes your ERP system, the cloud’s resilience won’t save you. Your application is down—and your business takes the hit.

This is where the shared responsibility model gets murky. Organizations spend heavily on multi-availability zone and multi-region deployments, assuming they’re protected. But Merry warns that geographic distribution alone doesn’t guarantee seamless failover.

“If I have failover between the US East one and US West regions, I still have to make sure that my application is able to promptly come in service when switching over,” he says. “If my US West region doesn’t have all of the data that my US East region has, it doesn’t do me much good if I can bring the database in service over there.”

Two critical failure modes emerge in these scenarios: split-brain conditions and data inconsistency.

A split-brain occurs when both the primary and standby systems believe they’re active. In practical terms, this means two database instances might both think they’re the source of truth, leading to conflicting writes and potential data corruption. “Neither is in the role of accepting incoming replicated data,” Merry notes, “and so there’s the potential for data to not get copied from one system to another.”

Data inconsistency, meanwhile, stems from replication lag and synchronization failures. When an application fails over to a secondary region, that region needs the most current data to maintain business continuity. But if replication hasn’t kept pace—whether due to network latency, write volume, or configuration issues—you’re failing over to stale data. Your recovery point objective just went out the window.

Multi-region deployments do offer protection against catastrophic infrastructure failures like data center fires or natural disasters. But they don’t inherently solve application-level availability challenges. For that, you need application-aware high availability solutions.

“Cloud native tools can lack that deep application awareness to be able to have the knowledge of what all of those dependent services are,” Merry explains. A database doesn’t just need to start—it needs its storage volumes mounted, its virtual IP accessible, and potentially an SAP environment ready to manage transactions. A generic cloud tool might restart the database, but it won’t orchestrate the full dependency chain required for true operational readiness.

This is where specialized HA solutions like SIOS LifeKeeper come into play. These tools understand application dependencies and can orchestrate complex failover sequences that ensure not just that an application starts, but that it starts with all prerequisites met.

Merry’s advice for IT teams evaluating HA solutions? Look for ease of use and deep application awareness in tandem. “When you really need to interact with your high availability solution, it’s either going to be in a situation of downtime or a situation where you’re performing maintenance,” he says. “You don’t want any unexpected issues or errors to occur.”

The bottom line: cloud infrastructure resilience is table stakes. But if you’re running mission-critical applications—databases, ERP systems, custom business apps—you need a layer of application-aware protection that the cloud provider can’t give you. Without it, you’re one application crash away from costly downtime.

Cloud Migration’s Hidden Trap: Why Infrastructure Resilience Doesn’t Protect Your Apps | Philip Merry, SIOS Technology | TFiR

Why Enterprise AI Adoption Is the Biggest Bottleneck — and Why That’s Familiar

How Enterprises Stop Breaches with Automated Attack Surface Management

Why Enterprise AI Adoption Is the Biggest Bottleneck — and Why That’s Familiar

How Enterprises Stop Breaches with Automated Attack Surface Management

You may also like

Why AI Agents Fail in Production Without Trusted Telemetry | Shahar Azulay, groundcover | TFiR

Why OpenTelemetry Is Now the Foundation for AI and Cloud Observability | Chris Aniszczyk, CNCF | TFiR

How Self-Improving AI Works Without Human Intervention | Kunal Bhatia, Hexo Labs | TFiR

Why HA Health Checks Fail as Clusters Grow | Trey Isaac, SIOS Technology | TFiR

Why AI Agents Fail in Production and What the Meta Harness Actually Fixes | Amit Naik, CData | TFiR

85% of Domains Are Failing DNS Security Controls: Akamai’s Steve Winterfeld on the Hidden Threat | TFiR