Protecting Your Applications with Hammers

About the author: Jonathan Meltzer is Director, Product Management, at SIOS Technology. Jonathan has over 20 years of experience in product management and marketing for software and SAAS products that help customers manage, transform, and optimize their human capital and IT resources.

Let’s talk about applications, availability, and hammers. Yes, hammers: Big hammers, like technologies designed to fail over a high availability cluster to a remote site if disaster strikes your mission critical SAP or Oracle system, as well as smaller hammers designed for more precise jobs, like restarting a stalled background service if that’s all that’s causing that SAP landscape to appear unavailable.

Many organizations think only about the big hammers. They need to be sure that their mission critical systems will fail over and continue operating if the worst comes to pass. But most application availability issues don’t involve worst case scenarios, which means that the big hammer is not always the best tool to rely on when protecting application availability.

Clearly a range of tools is in order, and most of these will take the form of application-aware clustering technologies that can monitor and proactively respond to conditions arising in a given operational environment.

Enhancing the availability of your mission critical applications

We all know that innumerable small failures are occurring all the time in an IT environment. Having to drop everything to respond to all these small issues can be as taxing to an IT team as responding to a single major outage. There are application bugs, read and write errors, network collisions and data packet resends, CPU and memory issues. Many of these errors would not by themselves compromise access to an application. But they can accrue. They can become issues that will eventually mature into problems—at which point they may very well compromise application access for some, perhaps all, of your users.

Organizations need a way to watch for these smaller errors and to respond in an automated, application-appropriate manner. If an availability solution detects a stalled background process, for example, it might simply restart that process. If other processes depend on the stalled process, the solution needs to have the awareness to restart several processes, perhaps in a particular order.

Automating and synchronizing the multiple process restarts in this example requires a certain application awareness. Clearly, in a mission-critical environment like that of a complex SAP S/4 HANA landscape, many components rely on other components, so any monitoring and recovery tools employed to watch for these kinds of anomalies would need to have an awareness of those dependencies to provide an automated response that is appropriate to the scenario. Whether the response is simple or complex, though, detecting and responding to these slighter matters would be far less disruptive than a full failover would be. Restarting a background process might take only fractions of a second, and the majority of users interacting with the SAP system would likely not even notice a disruption.

As it happens, the areas where mission-critical applications are likely to encounter these kinds of smaller errors have been mapped out by companies offering clustering software with application-aware monitoring and management tools (also known as application recovery kits or ARKs). The points of vulnerability vary from application to application, of course, so you will need to assess your application infrastructure to choose a clustering solution with the ARKs that your application infrastructure requires. ARKs integrate with the application orchestration and data replication functions of the clustering software to prevent the big hammer of failover from falling if the precision tools of the ARK can solve a problem more efficiently. At the same time, that integration also ensures that the big hammer slams down immediately if the ARK cannot solve the problem.

Improving the availability of the merely important applications, too

The right-sized precision that makes ARKs useful in the big-hammer world of high availability failover clusters can deliver benefits in another scenario as well: It can help you improve the availability of those applications that are important but not so critical that they warrant the expense of a failover cluster infrastructure.

Cloud-based application protection tools are available on a SaaS basis that can monitor and proactively remedy issues arising on your important-but-not-critical applications. These solutions are not designed to ensure the 99.99% availability that an HA infrastructure can, but they can increase the uptime of your AWS-based applications by watching for issues and responding proactively. They can restart misbehaving application services, for example, and if that doesn’t work they can restart an entire cloud instance. They can correct many of the anomalies that might otherwise contribute to an application infrastructure becoming unstable or ultimately unavailable. They enhance the underlying availability of a cloud-based application, resulting in a more consistent experience of application availability than your team might experience in the absence of any protection.

The big hammer is important, but it needs to be saved for those true “break glass” moments, which are few and far between. By combining clustering, application-awareness, and availability enhancement solutions, you gain a range of instruments—small hammer to big hammer—ensuring that you can rely on the right tool at the right time. You gain greater assurance of application availability—more effectively, and across a wider set of applications—than you could with only the big hammer in your tool box.

Protecting Your Applications with Hammers

Enhancing the availability of your mission critical applications

Improving the availability of the merely important applications, too

What The Fish! | Unpopular Opinion [Season 1, Episode 1]

Meet Maddy Osman, Sponsor Wrangler for WordCamp Denver

Enhancing the availability of your mission critical applications

Improving the availability of the merely important applications, too

What The Fish! | Unpopular Opinion [Season 1, Episode 1]

Meet Maddy Osman, Sponsor Wrangler for WordCamp Denver

You may also like

The RBAC Reality Check for AI in Platform Engineering | Corey McGalliard, Akamai Cloud | TFiR

Why AI Compounds Cloud Cost Problems and How Java Runtime Tuning Fixes It | Peter Maloney, Azul | TFiR

How to Run AWS Locally and Cut Cloud Dev Costs | Waldemar Hummer, LocalStack | TFiR

How Klutch Installs Into Any Kubernetes Cluster | Julian Fischer, anynines | TFiR

Why Platform Engineering Teams Over-Abstract and How Modular Design Fixes It | Corey McGalliard, Akamai Cloud | TFiR

Why HA Failover Fails: Overlooked Application Dependencies and Untested Runbooks | Matthew Pollard, SIOS Technology | TFiR