Observability

Your Observability Bills Are Exposing an Architecture Problem | Eric Tschetter, Imply | TFiR

0

Guest: Eric Tschetter, Chief Architect at Imply

Telemetry volume is exploding, and the bills are proving it. OpenTelemetry has done something remarkable—it standardized how teams collect telemetry data across the entire stack. But in doing so, it’s exposed a harder problem: once all that data is flowing, where does it actually live, how long can you afford to keep it, and how do you query it without rebuilding everything from scratch?

Eric Tschetter, Chief Architect at Imply, sees this moment as a historical echo. The same evolution that transformed business intelligence—decoupling data storage from the tools that query it—is now coming to observability and security. The monolithic platforms that bundled collection, storage, and UI into one tightly-coupled stack weren’t designed for this kind of scale, and the costs are making that clear.

The Data Collection Problem OpenTelemetry Actually Solved

Before OpenTelemetry, teams were locked into vendor-specific agents. You used Logstash if you were in the Elastic ecosystem, the Datadog agent if you were using Datadog, or Splunk forwarders if you were in that world. Each tool controlled how data was collected, and switching vendors meant ripping out your entire telemetry pipeline.

OpenTelemetry broke that pattern. It democratized the data acquisition layer, giving teams a standardized way to collect logs, metrics, and traces without locking themselves into a specific platform. But Tschetter points out that OpenTelemetry didn’t solve the next problem—it just made it more obvious. “You have that base layer of data collection from OpenTelemetry, but then where does it go?” he asks. “You’re still funneling into some tightly, vertically oriented stack that is specifically organized and oriented towards one UI, one way people are interacting with data.”

That tight coupling creates a new bottleneck. If your telemetry data flows into Datadog, only people using the Datadog UI can interact with it. If it’s going into Splunk, your security team might have access, but your SREs and business analysts don’t. The data becomes siloed—not because teams don’t want to share it, but because the architecture doesn’t allow it.

Why Business Intelligence Already Solved This Problem

Tschetter draws a direct parallel to how the BI world evolved. In business intelligence, there’s a clean separation between the interaction layer and the data layer. SQL databases created that separation, which is why you can use Tableau, Looker, or Power BI against the same data warehouse without rebuilding anything.

“The SQL database has created a very clean separation where you can have your Tableau, you can have your Looker, you can have your Power BI,” Tschetter explains. “There’s a whole suite of different UIs built to work with data warehouse in the business intelligence world that basically allow for different users and different personas to get value out of that data.”

That same pattern is exactly what teams need in observability. They want to land telemetry data once and then use it across multiple tools—Kibana for some teams, Splunk for others, Grafana for dashboards, and AI agents for anomaly detection. The current monolithic platforms force teams to choose one tool and lock everyone into that workflow.

The shift Tschetter is describing isn’t speculative. It’s already happening. Teams are realizing that the unified data lake promise only works if the data layer is decoupled from the query layer. Different data types have different access patterns, and when you’re working with logs, you’re doing correlation searches across unstructured data. If the storage layer exploits those internal structures, you can shrink the data while keeping it queryable.

How Compression Without Decompression Changes the Economics

One of the core innovations Imply is introducing with Imply Lumi is compression technology that shrinks log data significantly while maintaining its structure so it remains queryable without decompression. That might sound like a technical detail, but it fundamentally changes the economics of telemetry retention.

“When you can make the data smaller, you can retain more for less cost,” Tschetter says. “If you have the indexing structures to allow you to understand that data without needing to decompress it, you can skip over data that doesn’t matter for your search and only look at what’s necessary.”

Traditional systems decompress everything and then throw away what they don’t need. That’s inefficient at scale. If you can index compressed data and only process what’s relevant to the query, you reduce both storage costs and compute requirements. That’s critical as telemetry volumes grow, especially with AI workloads generating and consuming data at a faster rate.

Why AI Will Accelerate the Decoupling Trend

Tschetter makes another historical comparison—this time to manufacturing automation. When robots replaced humans on assembly lines, the assembly line didn’t disappear. It evolved. Factories are now designed around how robots work, not how humans worked. The same shift is coming to data platforms.

“AI is fundamentally automating human workflows,” Tschetter explains. “LLM models are taking what humans are doing and automating it to a certain extent. Manufacturing automation changed the shape of assembly lines. AI and LLMs are going to impact the interaction with data in the same way.”

That means the data platform—the assembly line for telemetry—won’t go away, but it will have to change. AI agents will consume more data and generate more insights, which puts even more pressure on cost and storage. The only way to handle that is to have the lowest possible unit cost for storing data while maintaining the structures that let AI interact with it as efficiently as possible.

What an Observability Warehouse Actually Looks Like

Imply’s approach with Lumi is to build a log-oriented data layer that decouples storage from query. The product doesn’t dictate what tools teams use to interact with their data. Instead, it implements multiple query languages—SPL, KQL, SQL—so teams can keep using the workflows they already have.

“We don’t necessarily need to have an opinion about what the end user is actually using to interact with their data,” Tschetter says. “Are they using an AI agent? Are they using Splunk? Are they using Kibana? That’s not something we have to decide, as long as we can enable all of those options.”

That’s the core idea behind an observability warehouse. It’s not a replacement for existing tools. It’s a foundational layer that works with them. Teams can retain more telemetry data at lower cost, run interactive queries without changing dashboards or alerts, and enable AI-driven insights without vendor lock-in.

The shift is already underway. OpenTelemetry democratized data collection. The next step is democratizing data access—and that requires decoupling where telemetry data lives from how teams actually use it.

Why Security Leaders Should Plagiarize Compliance Frameworks | Steve Winterfeld, Akamai | TFiR

Previous article

Hybrid Cloud Is Breaking Streaming Architecture—Here’s How to Unify It | Prenil Kottayankandy, Akamai & Zeke Dean, Redpanda | TFiR

Next article