AI InfrastructureCloud NativeOpen Source

Why OpenTelemetry Is Now the Observability Standard for Cloud Native and AI Workloads | Chris Aniszczyk, CNCF | TFiR

0

Instrumentation fragmentation is costing engineering teams vendor flexibility and making AI workload observability nearly impossible to implement consistently. Every major cloud provider and observability vendor is now converging on a single open standard, and teams still running proprietary or homegrown instrumentation are accumulating technical debt with no clear migration path. The window to standardize before AI agent deployments scale is closing.

In this interview on TFiR, Chris Aniszczyk, CTO at CNCF, breaks down why OpenTelemetry has reached critical mass across the industry, how it enables vendor optionality for enterprise teams, and what the community is building right now to extend OTel for AI and inference workloads.

Guest: Chris Aniszczyk, CTO at CNCF
Show: TFiR

Here is what every platform engineer and observability practitioner needs to know.

Technical Deep Dive

Q: Has OpenTelemetry reached the same foundational status as Kubernetes in cloud native infrastructure?

Chris Aniszczyk, CTO at CNCF, says OpenTelemetry has reached the same inflection point Kubernetes hit years ago. Every major observability vendor, including Splunk, Datadog, Grafana, and Honeycomb, now supports OTel by default, and Amazon recently announced that CloudWatch supports OTel natively. Almost every major programming language has an OTel SDK, and all major hyperscalers have adopted it, signaling that the standard has moved beyond early adoption into default infrastructure.

“It is the Kubernetes of the observability world.” — Chris Aniszczyk, CTO, CNCF

Q: How does OpenTelemetry give enterprise teams vendor optionality without full re-instrumentation?

Aniszczyk explains that when teams instrument their applications using OTel internally, they gain the ability to route telemetry to different vendors for different purposes, such as Datadog for one use case and Grafana for another, without changing the instrumentation layer. This mirrors what Kubernetes did for compute portability: it does not make migration effortless, but it makes switching significantly easier than with proprietary agents. The choice is structural, not just theoretical.

“If internally we are instrumenting our applications using OTel, then maybe we could use Datadog for this, Grafana for this. It gives them a strong optionality and choice.” — Chris Aniszczyk, CTO, CNCF

Q: How are regulated industries with legacy homegrown observability stacks using OpenTelemetry?

Aniszczyk notes that many regulated industries with long-established stacks built their own observability tooling years ago. These teams are now modifying those homegrown systems to emit OTel-compatible data, which gives them a modernization foothold without requiring a full rip-and-replace. Over time, OTel emission creates a viable pathway to eventually decommission the proprietary solution entirely.

“A lot of regulated industries that have been around for a while, they have homegrown solutions, and they have now modified those to go emit OTel related data. That gives them a pathway eventually to move off of that homegrown solution.” — Chris Aniszczyk, CTO, CNCF

Q: What does OpenTelemetry contribution velocity data actually show about its adoption trajectory?

According to Aniszczyk, CNCF publishes an open source Project Velocity report, and OTel ranks number two in the entire CNCF portfolio by contribution velocity, behind only Kubernetes. Globally, OTel ranks in the top 20 to 30 open source projects by contributions. This means adoption is not just being driven by vendor support announcements; a large volume of contributors are actively showing up to build and maintain the project.

“OTel is literally number two behind Kubernetes for contribution velocity. It’s not just people using it and vendors supporting it. A lot of people are showing up.” — Chris Aniszczyk, CTO, CNCF

Q: Do AI workloads require a new observability pillar beyond logs, metrics, and traces?

Aniszczyk argues that AI workloads do not require an entirely new observability pillar. AI agents still produce logs, require metrics to understand behavior, and need tracing to follow call chains across APIs and databases, the same requirements as traditional microservices. What is needed is additional metadata support, such as which model was used, which prompt was invoked, and inference-specific attributes, which are extensions on top of the existing OTel model rather than replacements for it.

“I don’t think AI workloads necessitate a new pillar, it’s a new use case. You’re going to produce logs, you’re going to need metrics, you’re going to have to have traceability. Just like you need for traditional microservices, you need all those same things.” — Chris Aniszczyk, CTO, CNCF

Q: What open source projects are extending OpenTelemetry for AI and LLM inference workloads?

Aniszczyk identifies two active efforts. The first is OpenLLMetry, a project that extends OTel to support LLM-specific instrumentation. The second is Open Inference, which adds support for inference-based workloads by extending OTel with metadata around models and inference traceability. Both are either community-driven or startup-led, and Aniszczyk expects this work to be pushed upstream into the core OTel project over the next six to twelve months.

“There’s a couple efforts. One of them is called OpenLLMetry. Another one is called Open Inference, which is extending OTel to support inference based workloads. That work’s already being done.” — Chris Aniszczyk, CTO, CNCF

Q: How does the OpenTelemetry community model support long-term evolution into new use cases like AI?

Aniszczyk draws a direct parallel to how Linux and Kubernetes evolved beyond their original design scope. Linux was never intended for mobile devices or spacecraft. Kubernetes was never designed for edge deployments. Both expanded because the community demanded it and showed up to build it. Aniszczyk applies the same logic to OTel: once critical mass of vendor support and user demand exists, the community naturally extends the project to support emerging workloads.

“Linux was never meant to go into phones or into space, but here we are. OTel is being stretched by its community and once you have enough critical mass, things naturally evolve to support.” — Chris Aniszczyk, CTO, CNCF

Resources & Documentation

  • CNCF, Cloud Native Computing Foundation, home of OpenTelemetry and the CNCF Project Velocity Report
  • OpenTelemetry, official documentation, SDKs, and specification for the OTel standard
  • OpenLLMetry, open source project extending OpenTelemetry for LLM and AI agent instrumentation
  • Open Inference, community effort extending OTel to support inference-based workload metadata and model traceability

***

👇 Click to Read Full Raw Transcript

Swapnil Bhartiya: If you look at otel, I talk to of course open source team a lot, they also use it. If you look at this graduation, would you consider that when it comes to observability, OTel has kind of hit the moment of Kubernetes Linux kernel that it’s not just technology, it’s typical, the foundation of observability.

Chris Aniszczyk: It is the kubernetes of the observability world. You look at all the traditional observability vendors, your splunks, datadogs, your newer age ones, Grafanas, Honeycombs, they all support OTel by default. Almost every major programming language has SDKs for Otel. The big hyperscalers have it. Amazon even last month recently announced that CloudWatch supports Otel natively now. So these are all big signals that it’s everywhere. And I talk to a lot of our end users in CNCF and they actually love OpenTelemetry because basically it allows them to do two things. One, it gives them a little bit more choice of like how to choose vendors potentially. Like, oh, if internally we are instrumenting our applications using otel, then maybe we could use datadog for this, Grafana for this. It gives them a strong optionality and choice, which CNCF is all about. Basically what Kubernetes did like, hey, you could run on Google Cloud or you could run on your own private cloud. That level of choice is now offered for the observability part of your stack. And is it like super easy to move between things? No, not necessarily. Always a bit of work. Just like Kubernetes, you can’t magically move to different clouds. You got to do a little bit of work, but it’s significantly easier. The other thing is, I learned from some of our end users is a lot of people, especially with older stacks, a lot of regulated industries that have been around for a while, they have homegrown solutions, right? And they built their own kind of observability and they have now modified those to go emit OTel related data. So that gives them a, it helped them modernize a little bit and gives them maybe a pathway eventually to move off of that homegrown solution. So yeah, to me it’s like OTel is the kubernetes of the observability world and it’s reflective in the data, not only adoption, but you look at the commits, the contributions. We have this open source Project Velocity report that we produce in cncf. OTel is literally number two behind Kubernetes for like contribution velocity and so on. So it’s not just people using it and vendors supporting it. A lot of people are showing up and I think OTel ranks as like a top 20 or 30 open source project like worldwide in terms of contributions. It’s number two in cncf. But even worldwide it’s hugely.

Swapnil Bhartiya: All the interviews that I had discussion I have already opentel keep coming up. It doesn’t matter what the company. No, earlier you mentioned initially there were three pillars, then the fourth pillar was there.

Chris Aniszczyk: Yeah, exactly.

Swapnil Bhartiya: Now AI is there, AI workload is there and we hear about observability is very, very important here. How is OpenTelemetry either evolving or will evolve for AI workloads?

Chris Aniszczyk: Yeah, I don’t think AI workloads necessitate like a new pillar, it’s like a new use case. Right. Because I think at the end of the day you’ll have things like agents. Obviously you’re going to produce a lot of logs, you’re going to need metrics to figure out what the hell agents doing. You’re going to have to have traceability. So like he my agent kicked off an API call, hit a database like, just like you need for traditional microservices, you need all those same things. There needs to be some improvements and maybe modifications to OTel to support extra metadata around like maybe like which model was used, which prompt and like other things that are fully supported. And there actually are efforts out there that are either being done by the community or startups. There’s a couple efforts. One of them is called elementary or open elementary, kind of like a fun spin on on lm. Another one is called Open Inference which is adding, you know, extending OTAL to support inference based workloads. This mostly is like metadata on models and all that traceability. So that work’s already being done. I think over the next six to 12 months you’re going to see more of that work to be pushed upstream and basically working with the OTel process and community going to support these, you know, workloads and any healthy project over time evolves with the community that shows up. It’s just like, just like we know, you know, I use the famous example like Linux was never meant to go into phones or in a space, but here we are today. Kubernetes was never meant to, you know, go into edge devices or space either. And here we are today. OTel is being stretched by its community and I think once, once you have enough critical mass of like enough support across vendors and users wanting to see it there, things naturally evolve to support.

How to Test Multi-Cloud and Sovereign Cloud Workloads Locally | Waldemar Hummer, LocalStack | TFiR

Previous article