SQLBits is coming to ICC Wales, and SIOS Technology will be there as a sponsor with a full presence on the expo floor and multiple technical sessions on multi-cloud SQL Server resilience. Most SQL Server shops pick one cloud and stay with it—but single cloud dependency creates a single point of failure when Azure or AWS has a bad day.
In this exclusive interview with Swapnil Bhartiya at TFiR, Dave Bermingham, Senior Technical Evangelist at SIOS Technology, previews his upcoming presentation at SQLBits on “Building Resilient SQL Server HA/DR in a Multi-Cloud World.” Dave will be delivering a technical session on the final day (open to the public on Saturday) demonstrating how to build SQL Server deployments that can fail over between cloud providers.
The Guest: Dave Bermingham, Senior Technical Evangelist at SIOS Technology
Key Takeaways
- Multi-cloud SQL Server architecture requires more than data replication—networking, DNS, application connectivity, and orchestration must all work seamlessly during failover
- AWS Multi-Cloud Interconnect offers a managed, pay-as-you-go private connection between cloud providers without long-term contracts
- Define your RTO and RPO before touching architecture—not every database needs real-time replication, and over-engineering drives up costs
- Testing is non-negotiable: your DR plan must be validated regularly so junior engineers can execute it under pressure
- SQL Server Standard Edition users can build multi-cloud failover cluster instances with SIOS DataKeeper, avoiding expensive Enterprise Edition licensing
***
In this exclusive interview with Swapnil Bhartiya at TFiR, Dave Bermingham, Senior Technical Evangelist at SIOS Technology, discusses what multi-cloud SQL Server high availability and disaster recovery actually looks like in practice, ahead of his presentation at SQLBits in ICC Wales.
The Cloud Itself Can Be A Failure Domain
While many organizations talk about multi-cloud, fewer have fully committed to the approach. What’s driving interest is the recognition that cloud providers themselves can experience catastrophic failures. Even with multiple availability zones or regions, if you’re locked into a single provider, you’re exposed during large-scale incidents.
Q: What is driving organizations to deploy SQL Server across multiple cloud providers instead of sticking with a single cloud like Azure or AWS?
Dave Bermingham: “I think a lot of organizations are talking about multi-cloud but in reality, I haven’t seen that many fully go through with it yet. But what they’re starting to recognize is that the cloud itself can be a failure domain. It doesn’t matter how many availability zones or regions you’re using. If you’re in a single cloud provider, if there’s a larger issue, and it’s happened, it’s not imaginary. If there’s a larger issue, you’re exposed if you have everything in that single provider.”
Today’s most common approach is hybrid cloud—an on-premises data center connected to the cloud, with on-prem serving as the disaster recovery site. This gives organizations control and a layer of protection outside the cloud provider. But for organizations that don’t want or can’t manage their own on-premises infrastructure at all, multi-cloud makes far more sense.
Dave Bermingham: “For organizations that don’t want or can’t manage their own on-prem infrastructure at all, that’s where multi-cloud really starts to make a lot of sense. It gives them a way to reduce the risk and avoid being completely tied to that single cloud provider while still deploying and staying fully in the cloud.”
Multi-Cloud SQL Server Architecture Options
In a traditional sense, multi-cloud is often deployed for disaster recovery scenarios—a standby environment in a different cloud provider that’s ready to be activated if something goes wrong. The challenge is getting the data there and recovering the application quickly and cleanly.
Q: Can you walk us through what a multi-cloud SQL Server architecture looks like? Are we talking Always On Availability Groups straight across AWS and Azure, or something totally different?
Dave Bermingham: “There’s lots of ways to do that with SQL Server, starting with your simple backup or restore or log shipping or replication, but each one of those comes with different trade-offs in terms of recovery time objective and data loss. So if you’re looking for the very best RTO and RPO for your most business critical applications, then yes, absolutely, technologies like Always On Availability Groups come into play in a multi-cloud scenario.”
For multi-cloud deployments, Distributed Availability Groups are often a better fit than standard Always On Availability Groups. Distributed AGs let you keep your clusters in each cloud independent of each other, which reduces bandwidth requirements between cloud providers. That reduces data egress costs and minimizes dependencies between the two different clusters running in each cloud.
But all of this assumes you’re running SQL Server Enterprise Edition.
Dave Bermingham: “For organizations that are running Standard Edition, a comparable approach might be to use something like SIOS DataKeeper to build a SAN-less cluster. Using Standard Edition they can build a traditional SQL Server failover cluster instance that can span cloud providers without the shared storage requirement and without upgrading to Enterprise, so you get a very similar level of protection while still saving a significant amount of money on your SQL Server licensing.”
At the end of the day, the architecture isn’t one-size-fits-all. It comes down to your recovery time objective, your recovery point objective, and your budget.
The Real Technical Challenges: Beyond Data Replication
Most organizations start by focusing on getting data replicated—log shipping, availability groups, backup and restore. But data replication is just one piece of the puzzle. The real complexity is making sure everything comes up correctly on the other side.
Q: What are the biggest technical challenges when trying to fail over SQL Server from one cloud provider to another?
Dave Bermingham: “A lot of people start out focusing on getting the data replicated, which is important, but that’s really just one piece of it. The real complexity is making sure that everything comes up correctly on the other side. So that means you have to think about things like networking, DNS and the applications that are connecting to your database. You’ve got to make sure that the app is available and configured to connect to that database after failover and then, of course, the users—they need to know we’re running in a different cloud. How are they being redirected to this application that’s now running in an entirely new environment?”
When you talk about data replication, latency and data consistency are major factors. Once you’re crossing cloud boundaries, you’re almost always dealing with asynchronous replication. Organizations must understand the risk of data loss and what that means to the business.
Then there’s orchestration.
Dave Bermingham: “It’s one thing to have a copy of your data sitting in the other cloud, but it’s another thing entirely to have a clean failover in a predictable way when something breaks. And that’s where a lot of organizations can fall short. You need to have a solid DR plan that’s tested and automated as much as possible, and simple enough that even a junior engineer can execute it under pressure.”
Networking: VPNs, Private Connectivity, and AWS Multi-Cloud Interconnect
Networking is one of the more challenging parts of multi-cloud configuration, but it’s entirely doable. Organizations typically start with VPNs because they’re relatively easy to set up, but as things scale, they usually move toward private connectivity for better performance and predictability.
Q: How do organizations actually handle the networking complexity—VPN tunnels, latency, data transfer costs—when running SQL Server high availability and disaster recovery across clouds?
Dave Bermingham: “A lot of organizations might start with VPNs because they’re relatively easy to set up and can get you going quickly, but as things scale, they usually are moving towards more of a private connectivity for better performance and predictability.”
There’s also a newer option that sits in the middle. AWS introduced Multi-Cloud Interconnect last year at re:Invent—a managed, private connection between AWS and currently Google Cloud Platform, with AWS and Azure coming online later this year. It’s delivered in a pay-as-you-go model.
Dave Bermingham: “It makes it a lot easier to build a redundant private network between cloud providers without long-term contracts and commitments. I think of it as a nice middle ground between the DIY VPNs and a fully provisioned private circuit.”
Latency is always a factor. Synchronous replication is typically used locally when clusters span availability zones in a single region. But when crossing cloud providers, asynchronous replication is the way to go.
One area people often underestimate is cost. Data egress between cloud providers can add up quickly. Organizations need to be intentional about how much data they’re moving and how often. Data compression and distributed AGs can minimize the amount of data being transmitted across connections.
Dave Bermingham: “You want to make sure you understand what you’re signing up for and how much that’s going to cost. At the end of the day, the goal is to keep the design as simple as possible and make sure it aligns with all your business requirements. You don’t want to over-engineer, and you don’t want to over-provision.”
First Steps Toward Multi-Cloud Resilience
If you’re currently running SQL Server in a single cloud and want to move toward multi-cloud resilience, the first step isn’t technical—it’s definitional.
Q: If someone is currently running SQL Server in a single cloud and wants to move towards multi-cloud resilience, what is the first concrete step they should take?
Dave Bermingham: “The first step before you do anything is to define what resilience actually means for your business. Before you touch the architecture, you need to understand your RTO, your RPO—how much downtime can you tolerate, how much data loss is acceptable? And once you have those answers, that will point you in the right direction. Not every database is going to be protected in the same way.”
Once you understand your RTO and RPO, establish a DR target in the cloud. That usually involves getting the networking in place, getting replication working, defining what replication technology you’re going to use, and making sure you can actually recover the workload in the event of a disaster.
Once that’s in place, layer in as much automation and orchestration as possible.
But the biggest thing people underestimate is testing.
Dave Bermingham: “You need to test, test and test again. Validate it and then practice it on a regular basis. The last thing you want is to discover that your DR plan doesn’t work in the middle of an outage.”





