Observability Paradox: It Adds Complexity To A System And Also Removes It

Guest: Rob Hirschfeld (LinkedIn)
Company: RackN (Twitter)
Show: TFiR: T3M

Observability has transformed in the last couple of years. It started from very simple log monitoring and tracing, but when containers were introduced, production systems became complicated. The actual running instances where those logs were generated were ephemeral, making it difficult for operators and developer teams to support and figure out what was broken. In this episode of TFiR: T3M, Rob Hirschfeld, Co-Founder and CEO of RackN, shares his insights on observability – the current trends in the market and where it’s headed.

To The Point Summary Of This Discussion

Developers are injecting observability points, logging and tracing different data they want to collect into their code to produce better results.
Observability adds complexity to a system, but it also removes it. For example, with a distributed system, the challenge is to track down where something ran. While it adds the complexity of having another infrastructure to collect this data, you’ll be able to do the real-time trending and analysis. This eliminates the whole idea of figuring out where the code is running, where that trace is, if the log still exists, which server it was on, etc. That is a tremendous win from a developer, DevOps, and security teams across the board.
Challenges with observability: 1) It requires an additional platform, and 2) It requires developers and access to code that can get instrumented. A lot of companies, including RackN, are building Prometheus metrics and specs into their product to make it easy to attach to observability points.
DevOps is a challenge from an observability perspective because it doesn’t have the entry points to put Prometheus metrics into systems. Most DevOps tools are not designed to have that type of observability data output that can be monitored. Part of that is because they’re really job or task specific, not process-based.
A great observability system from an application stack has good injection points, places where you emit data, places where you emit events, places where you can hook in and track things that are going on. Even if you don’t have access to Prometheus-type of logs or real-time logs, you can still do a tremendous amount to improve the transparency and observability of a DevOps infrastructure type of platform.

What’s Ahead For Observability:

We’ll produce logs and do log and metrics work that is designed to be fed into artificial intelligence for grooming. The goal: how do I make my systems more observable for an AI to help diagnose and troubleshoot it?
This means more voluminous logs, more signals to sort through in the hopes that a machine learning model might be able to pick up a trendline faster than a human can.
We’re going to start crossing into the length, the range where observability is going to be driving a machine learning algorithm as a first pass, instead of a human as a first pass.
There will be a lot more strain on the platforms that are supporting observability and logging and monitoring.

How RackN Helps Its Customers With Observability:

Its mission is to create a shared base of automation. By having shared code, it can invest the time to do the proper due diligence on security, cleanup, and resilience.
It provides information about job starts and stops with detailed logging and information out of the gate on those things will actually provide hooks where different triggers and events or alerts can be fired.
Its Digital Rebar provides logs, normal generated statistics to track variance, ability to have events within an action to actually throw events or raise alerts, and ability to add event emissions into routine tasks. This type of additional layer of interaction with the system really makes a tremendous difference in observability.
RackN has over 20 years of experience running an operating infrastructure so the things in their product are a direct reflection of that hard earned experience.

This summary was written by Camille Gregory.

You may also like

Apiiro wants to be the Diamond Standard for Application Security Posture Management

Akamai further fortifies its API Security with latest PCI DSS Compliance

If Iron Man has Jarvis, Transposit has Tanya

Bringing vCluster to Rancher is healthy competition: Lukas Gentele

How to choose best observability practices: Julian Fischer

Rootly’s AI-powered On-Call simplifies incident management