Guest: Justin Cobbett (LinkedIn)
Company: Akamai (Twitter)
Show: TFiR: T3M
The velocity, the variety, and the sheer amount of data coming in is expanding the size of the environments. To get observability running across an entire system, which may span multiple clouds, even hybrid clouds, companies are looking for out-of-the-box solutions. However, observability goes beyond buying a platform or a SaaS solution, it is a practice.
In this episode of TFiR: T3M, Akamai Product Marketing Manager Justin Cobbett, shares his insights on observability, how the term has been used through the years, and what it means today.
Evolution of the term:
- Observability is a condition or a state of your environment. The term goes all the way back to the 1960s and it’s an offshoot of control theory. It includes ways to see and measure what’s happening inside complex systems.
- The Cloud Native Computing Foundation (CNCF) has its own category for observability, which includes monitoring, logging, tracing, and chaos engineering.
- Monitoring can tell if your server has a problem, the CPU is spiking, or even something as simple as something is up or down. Observability is seeing not just that it’s working, but how it’s working. You need to be able to get data, process and correlate that data, do root cause analysis, and identify incident response. All of that together in your platform is what makes observability.
Observability and SIEM/SOAR/XDR:
- The following individual platforms/tools/practices are components of observability, but not exclusively.
- A security information and event management (SIEM) tool helps collect and analyze log data from a wide range of sources including servers, networking and devices that are plugged in.
- A security orchestration, automation, and response (SOAR) platform automates the incident response, who’s supposed to do what, and which systems are supposed to be involved.
- An extended detection and response (XDR) tool enables correlation of data from different sources.
Traditional IT vs. DevOps vs. AI:
- Traditional IT refers to traditional development models and basic infrastructure, standing things up, making sure things are working, and growing them at a relatively lighter scale.
- DevOps models were about rapid change. Not only are you getting a lot of data from big data and data analytics, but you’re starting to develop more versions, more software, and rolling them out way faster.
- Observability is a unifying umbrella to capture information from disparate systems within an enterprise. AI is playing a big role in making sure that that information is actually useful. Humans are removed from that because we’re just too slow to be able to do that. In this part, AI is enabling teams to do more with the data they have. It is pretty much the only way to rapidly and reliably identify the patterns of a problem and create a course of action.
What’s ahead in the observability space:
- The focus will be on predicting what will happen to prevent outages, downtime, and poor experiences. AI and observability are starting to roll into one.
- More out-of-the-box solutions that are also flexible.
- Ability to use platforms and set up observability packages without having a degree in data science.
- Some companies might even go right to using a function-as-a-service platform that is often powered by Kubernetes on the backend and get an observability platform that accommodates that.
- More familiarity with the observability tools that are being rolled out by the CNCF.
- More chaos engineering, which assumes “Something is going to go wrong, so let’s plan it and see what the effects are.”
- Consumption-based pricing models are going to be key to wider adoption in small to medium-sized businesses.
Advice for companies looking to invest in observability:
- Evaluate your team: their skill set, cycles, and availability. Then, decide how much you want to actually buy off-the-rack solutions or to start rolling out your own tools. (At Akamai, they build their own solutions. They have observability into the platforms that they create for their own sake to keep them up and available.)
- The culture is similar to zero trust. Keep it in the back of your mind. Everything that you do, think of how you can pull logs and telemetry from every step of the application process and then funnel it to a location where, even if you don’t have the platform yet, you will be able to use them to plan for observability.
This summary was written by Camille Gregory.