AI InfrastructureObservability

AI Agents Are Breaking Observability — Snowflake’s Jeremy Burton on What Comes Next | TFiR

0

For a decade, observability has operated on a simple premise: find out what broke and why. Monitoring told you whether a system was running. Observability told you why it failed. That model, forged in the fires of the microservices revolution, worked reasonably well when systems were deterministic — when code either ran or it didn’t, when APIs returned results or threw errors, when a failure produced a signal you could write an alert for.

That era is ending. AI agents — autonomous software components that interact with large language models, query databases, execute code, and spawn sub-agents — do not fail the way traditional software fails. They run. They respond. They produce output. But that output can drift quietly and continuously, shaped by probabilistic LLM behavior, shifting context windows, and prompt variations that leave no error code, no stack trace, and no alert in any dashboard ever built. An agent can be fully operational and deeply wrong at the same time, and nobody will know until the damage is done.

The scale dimension compounds the problem. The microservices transition already triggered an order-of-magnitude increase in telemetry volume. AI agents — potentially thousands or tens of thousands within a single organization — are poised to dwarf that. Add the explosion of AI-generated code from tools like Claude Code and Codex, and the telemetry volumes entering production pipelines over the next few years will render the economics of legacy observability platforms structurally unviable. More data, more queries, same or higher cost per query: that math does not work.

The architectural assumptions baked into most observability tooling — proprietary index-based ingestion, siloed log and trace stores, tightly coupled and highly curated user interfaces — were built for a different era. As software engineers increasingly query telemetry not through dashboards but through coding agents, MCP servers, and CLI integrations inside environments like Cursor and Claude Code, the walled-garden model that Datadog, Dynatrace, and their peers spent a decade constructing is facing an existential disruption it was not designed to survive.

Snowflake‘s observability unit, built on the foundation of the Observe acquisition, is betting that the answer is architectural: a platform that treats observability as fundamentally a data problem, built on open formats like Apache Iceberg, elastic compute and storage separation, scan-based query engines, and deep vertical integration with the underlying data platform — designed from the ground up for the economics and scale demands of the agentic era.

The Guest: Jeremy Burton, General Manager, Observability Unit at Snowflake

Key Takeaways

  • AI agents introduce a new observability problem class — behavioral drift and response explainability — that legacy monitoring and observability tooling was never designed to detect.
  • Telemetry volumes are approaching an inflection point: Snowflake’s Observe platform already ingests multiple petabytes of data per day and processes approximately 300 million queries per day, and agentic workloads will push those numbers significantly higher.
  • The walled-garden, highly curated UI model pioneered by Datadog and Dynatrace is being disrupted by headless observability — developers and agents querying telemetry directly via MCP servers, CLI, and coding environments.
  • Apache Iceberg on S3 combined with OpenTelemetry is emerging as the open, lock-in-free architecture for enterprise telemetry storage, with any compatible query engine — including Observe — layered on top.
  • Burton’s bold prediction: within five years, the only credible observability platforms will be those in control of their own data platform, operating at scale, with the unit economics to match.

***

👇 Click to Read Full Transcript & Technical Deep Dive

In this exclusive interview with Swapnil Bhartiya at TFiR, Jeremy Burton, General Manager of the Observability Unit at Snowflake, discusses the fundamental architectural shift required to observe AI agents in production, the collapse of the walled-garden observability model, the emergence of headless observability via MCP and CLI integrations, the role of Apache Iceberg and OpenTelemetry in eliminating proprietary lock-in, and why Snowflake’s data platform DNA gives it a structural advantage in the agentic observability era.

Why AI Agents Break Traditional Observability

Observability evolved through two distinct phases: monitoring — is it running? — and then the deeper question of why something failed. AI agents introduce a third category that neither discipline was built to handle: systems that are running, returning responses, producing no errors, but behaving differently than they did yesterday. This is a function of the probabilistic nature of large language models, and it represents a genuinely new problem domain for platform engineers and SRE teams.

Q: When AI and AI agents enter the picture, when does traditional observability fail — and what actually has to be rebuilt from scratch?

Jeremy Burton: “An agent is something which talks to an LLM as well as talks to a database. There are some similarities between agents and microservices, but there are some profound differences. There will be code in agents, and that code will talk to databases, as microservices did in days gone by. But the big new variable here is that they will also interact with an LLM. In the past, we’ve been used to an agent not working — it’s failed or not running. And so that’s a monitoring problem: is it working or not? And then the second thing would be, maybe the microservice is running, but it’s failing in a rather unusual way, or we’re seeing a pattern of behavior that’s unusual. And we’d use observability tooling to find out why. With agents, it’s actually another slight change, because an agent can be running, it can be not giving an error, but it may just be giving you a slightly different answer than it was giving you last week, last month, last year. And so it’s not broken, it’s just giving you a slightly different answer or behaving a slightly different way than it did in the past. That in some respects is the design point of an LLM. They’re probabilistic systems, not deterministic — as the prompt changes or the contextual data changes, they can give you slightly different answers, which again is maybe not an error per se, but could have quite a profound impact on the operation of your application. So in this new world, what we’ve also got to do from an observability standpoint is we’ve got to look at the prompt and response, we’ve got to run things like evals against these agents to understand how responses are drifting over time, and we’ve got to try to understand what is causing that response to drift. Some of the things are going to be the same — you’re still going to have problems with code, you’re still going to have problems with databases — but you’ve got this new category of problems which is: it’s behaving a little bit differently based on the inputs or the context we’re providing. Hey, what changed? Why is the answer drifting? That is certainly a new problem domain that we’ve not seen in the past.”

Scale as an Architectural Forcing Function

The microservices revolution already demonstrated that more distributed components produce more telemetry and more complex interaction graphs. AI agents — which may number in the thousands or tens of thousands within a single organization — represent a further multiplication of that dynamic. The economics of querying telemetry data must decrease in cost per query even as total data volume grows, or the observability function becomes financially untenable at production scale.

Q: What is the order-of-magnitude change in telemetry volume that AI agents will drive, and what does that demand architecturally?

Jeremy Burton: “There could potentially be thousands, tens of thousands of these agents in one particular organization. We may have seen an order of magnitude increase in telemetry over the last few years. I think that’s going to continue, because what we know is that with more distributed components, you therefore get more telemetry, you get more interactions that you need to track. And if scale was a thing over the past few years, it’s going to be even more of a thing over the next few years. Number one, you have to have an observability platform architected for scale. I don’t just mean handling massive volumes of data — hundreds of terabytes a day, I think that’s going to be quite commonplace. The economics of querying that data have got to be an order of magnitude better than what we’ve seen in the past. If you’ve got more data and you’re going to run more queries, it stands to reason that the cost per query has to decrease over time. And so I think this new generation, observing AI agents, is really going to test product architecture. It’s going to test not just the ability to handle large volumes of data, it’s going to test the ability to query it efficiently.”

Q: How does Snowflake’s architecture specifically address those scale and cost challenges?

Jeremy Burton: “When we first started out several years ago, we felt that observability needed a new architecture. Most tooling was built on a bespoke platform, custom built for logs or custom built for time series. And we were the first to really build for one of these new data platforms, Snowflake. The architecture of these new data platforms offered many advantages. They used the cheapest storage you could find — S3 — that’s a big benefit given the data volumes. Snowflake was quite unique in that it was not an index-based query engine. And so when you ingest data, you didn’t have to build these huge indexes. It was a scan-based query engine. Also, the separation of storage and compute — you can apply compute elastically to a problem depending on the query complexity. Those were some of the things which made us feel like maybe a commercial data platform with a new architecture is going to equip us for a new generation at a completely different scale. As we sit today, we do about close on 300 million queries a day. We ingest a couple of petabytes of data a day. And at least in our end, we’ve proven that we can operate at the kind of scale that today’s workloads demand.”

The Death of the Walled Garden: From Curated UIs to Headless Observability

For the past decade, the dominant observability model — exemplified by Datadog and Dynatrace — was a walled garden: a highly curated, tightly coupled user interface that guided developers through a structured troubleshooting workflow. Burton argues this model is being fundamentally disrupted by the shift to AI-native development environments, where engineers query telemetry not through dashboards but through coding agents, MCP servers, and CLI integrations — a paradigm he calls headless observability.

Q: How does Snowflake’s data platform DNA shape the observability thesis, and is the convergence in the market still unproven — or is this definitively where things are heading?

Jeremy Burton: “For the last decade you’ve had companies like Datadog and Dynatrace and so on — they’ve been gradually building out almost like a walled garden. It’s a highly curated set of user interfaces which automate the developer’s troubleshooting workflow. And it started quite narrow with monitoring, and it expanded to logs and traces, and then pipelines, feature flags, RUM. And so it’s gradually expanded, but it’s revolved around this quite tightly coupled and highly curated user interface. Our sense was that is going to get blown apart. I think a lot of the users, particularly software engineers, they’re not going to learn a tool, they’re not going to learn a highly curated interface where I click my way through the troubleshooting flow. They’re going to demand certain telemetry, they’re going to demand certain insights, and those insights are going to be served up dynamically to the software engineer wherever they may be — in Cursor, in code, in GPT or wherever. And so the walled-garden highly curated interface we think has been blown apart. And so what you’re left with then is the data. You need the volume of telemetry, but you also need context. If you’re trying to find a needle in a haystack, and you’ve got a hundred terabytes of data a day — rough and tough, a petabyte a week — it’s kind of hard to find the needle. But if you can curate that data, if you can provide context, then maybe you’re not searching through 100 terabytes, maybe you’re searching through a few gigabytes. We felt the crux of the problem with observability increasingly was going to be a data issue — managing the large data volume, curating it and providing context for it, and then the economics: how much is it costing me to do all of this? And that’s really what led us to think, okay, if we were a part of Snowflake, we could hugely optimize Snowflake for our workload. We could do way more vertical integration, we could get really industry-leading economics, and then we could transform this data so that we could provide really great context to the array of agents that were now querying it. If we provide more context to the agent, then they’re going to find the problem more quickly.”

Q: What does headless observability look like in practice — and what does that mean for how Observe is building its platform?

Jeremy Burton: “Another prediction — in a few years’ time, most of the users who use Observe will never log in. They’re going to query their observability data from Cursor or from Claude Code or from their development environment or from GPT, and they’re going to use MCP or CLI plus skills, and most of the queries are going to come through a headless Observe. And so what’s key in that scenario is that we have the most accurate queries, that we have the fastest, and that we have the most cost efficient. And so we’re very focused on that, and we think that’s a data problem. That’s where we’re dedicating our time and effort. We never really set out to build a tool. We set out to build something which would solve the data problem — ingest the data, get the data in the right shape. And once you ingest the data and you get it in the right shape, then you can quickly and efficiently query it.”

AI-Generated Code and the New Telemetry Explosion

The observability scale problem is not only driven by AI agents at runtime — it is also being driven by AI at the code generation layer. Tools like Claude Code and OpenAI Codex are producing more code, faster, than human engineers ever could. More code in production means more telemetry. More telemetry means more queries. And more queries at higher volumes means the unit economics of observability must continue to improve or the cost curve becomes unsustainable.

Q: How does AI-generated code change the telemetry volume equation, and what does Snowflake’s position in data infrastructure enable?

Jeremy Burton: “If you look at the volume of code we’re generating — things like Claude Code, Codex — most of the code that is going to go into production over the next few years is not going to be written by a human. It’s going to be written by an agent. And there’s going to be a lot more of it, which means there’s going to be a lot more telemetry, which means that in theory there’s going to be a lot more cost. And we feel like we can keep that cost per gigabyte decreasing as data volumes increase because of the architecture of the platform. Now, Observe, as part of Snowflake, we obviously benefit from owning the data platform and we can start to do more vertical integration and optimizing Snowflake better for our workload. So I’ll make a bold statement: I think five years from now, the only credible observability platforms will be ones that are in control of their own data platform and are operating at scale, because I think it’s the only way you can get the unit economics to work.”

The New Observability Workflows: From Error Detection to Explainability

The integration of observability data with AI coding agents is already producing workflows that would have seemed implausible twelve months ago — pulling JIRA tickets, correlating them with telemetry from Observe, identifying the responsible code, and generating pull request suggestions, all within a single agentic loop. But the deeper shift is conceptual: observability must now extend beyond failure detection and root cause analysis into a new domain of response explainability, tracking why an LLM-driven system is behaving differently than it did previously.

Q: What do new agentic observability workflows actually look like in practice today?

Jeremy Burton: “The good thing with the observability tooling now — with the headless tooling, I can get access to Observe data through an MCP server, I can pull telemetry back and then I can say, hey, here’s an error I found in production, a 500 error, or here’s some error in the logging. Hey, coding agent, go find me the offending lines of code, explain to me what that code does and why not go ahead and suggest a PR for me. The bad news is you’ve got more code than you ever thought possible and you’ve got more telemetry than you ever thought possible. The good news is it’s never been easier to query that data. And actually you can have the code that generated the telemetry alongside the telemetry itself. So that’s almost a nirvana state. And we’re starting to see these amazing workflows where developers are saying, hey, go pull that JIRA for me, look at and go investigate what’s in that JIRA with Observe, and then go examine the code and explain to me the code that is responsible. These are workflows that would have seemed like rocket science just a year ago, but are actually very real today.”

Q: What is the new normal for observability when the system is not broken — it’s just different?

Jeremy Burton: “It’s not enough to just know whether something’s broken or why something is broken. That’s typically what observability has been for the last decade — monitoring was: what’s broken? Observability was: why did it break? There is definitely something now around: explain to me the response that the system is giving. It’s not broken. It’s not a failure condition, it’s not an error condition. But explain to me why this response is different to what it was yesterday and help me understand why that response is different. Is there a different prompt? Is there different context being given? And so some of these issues that we’re going to see in production are just a lot more subtle. It’s not an obvious 500 error from an API. It’s not even something’s broken in the user journey. It could just be a subtlety in the way that an LLM is responding based on a different prompt or a different amount of context. And so we’ve now got to get into that, which is a whole new area.”

The Agent Identity and Security Problem Blocking Production Readiness

Despite significant enthusiasm and a wave of promising prototypes, enterprises are finding it difficult to move agentic AI workloads from demo to production. The primary obstacles are not technical capability gaps — they are edge cases, agent identity management, and unresolved security questions about what an agent with human-equivalent credentials can access, discover, or exploit at machine speed.

Q: What is actually holding enterprises back from moving AI agents from prototype to production?

Jeremy Burton: “It’s quite easy in fact to build an agent which gives a great demo and is at the beta phase or the prototyping phase. I think there’s a lot of promising prototypes. But folks are finding it quite difficult to get that agent to be kind of quote, production ready. There’s a lot of edge cases. There’s still a lot of uncertainty about agent identity, for example. I know this agent in theory is just a digital human and therefore is subject to the same permissioning that a human would be subject to. But the power of this agent is potentially an order of magnitude greater than a human because it doesn’t sleep and it can methodically look at a system or a set of documents and it could find things potentially that a human could never find. So what I’ve seen at least is a lot of hesitancy about: is this ready for prime time, and have I covered all the edge cases to make this a production offering versus a great demo? And there are still some areas of great uncertainty around security. Even if an agent has got the same security credentials as a human, there’s still uncertainty about what it may be able to discover or find or get access to that a human maybe never would, because we’re fallible as humans and we have to sleep and we’re not methodical and so on and so forth.”

OpenTelemetry, Apache Iceberg, and the End of Proprietary Lock-In

A structural shift is underway in how enterprise telemetry data is stored and accessed. The combination of OpenTelemetry as the collection standard, Apache Iceberg as the open table format, and S3 as the commodity storage layer is dismantling the proprietary lock-in that has characterized observability platforms for years. Large enterprises are increasingly opting to store telemetry in their own S3 buckets using Iceberg tables, with any compatible query engine — including Observe — layered on top, rather than committing to a vendor-controlled data store.

Q: What is the customer reaction to the unified approach — and how are the largest enterprises thinking about the architecture?

Jeremy Burton: “We’ve tended to focus on large organizations. We think that in those organizations they feel the pain firsthand of having many, many silos of data — their time series data in a very different place than their tracing, several different logging environments, and a quite separate silo probably emerging around just looking at the data from LLMs. So the idea that you can bring this data together is compelling because when they get into troubleshooting, they see the disconnects and the inability to correlate data. The nice thing about observability is that people historically have not kept data for more than a few days because it was so expensive. So there’s no data migration. And with the larger organizations, what is becoming quite compelling is maybe not ingesting data into a proprietary Observe data store, but ingesting it into an S3 bucket and having the telemetry stored in Iceberg tables. For the longest time in the observability world, folks have felt locked in because they had a proprietary agent that would send data to a proprietary data store. And I think that whole world is changing. OpenTelemetry is going to be the collector. It’s going to send data into maybe the customer’s S3 bucket with an Iceberg-based data structure, and then Observe — or for that matter any Iceberg-compatible query engine — is going to be layered on top of that. So I think it’s quite compelling to put the data in one place, to store it in an open format, and be able to query it with something that supports that open Iceberg table format.”

Q: How are the largest enterprises thinking about the AI front end — the AI SRE layer — differently from smaller organizations?

Jeremy Burton: “There are companies that really like the idea of Observe’s AI SRE sitting on top of that data and helping people troubleshoot. But I would also tell you that the larger organizations right now are probably even more enamored with the headless AI SRE. Give me the MCP server or give me the CLI plus skills and allow me to plug Observe data into my broader incident management workflow. And I think a lot of the larger enterprises are thinking of constructing that larger enterprise workflow themselves. So that’s maybe the initial feedback: the Iceberg back end and then the AI front end. And people are thinking about the front end slightly differently depending on their size.”

Your HA Backup System Has Hidden Gaps — SIOS Technology’s Trey Isaac Explains How to Find Them | TFiR

Previous article