As infrastructure becomes increasingly complex, one thing remains clear: when networks break, everything else suffers. But in today’s cloud-native world, network engineers are in short supply. DevOps and SREs now shoulder more of the responsibility for ensuring uptime, but often without the deep networking expertise traditionally needed to diagnose and resolve issues.
That’s the problem Kentik is aiming to solve with its new Cause Analysis feature. Chris O’Brien, Senior Director of Product Management at Kentik, joined TFiR to explain how AI is closing the observability gap and empowering any engineer — not just network veterans — to pinpoint and understand root causes in real time.
“The network still underpins everything,” said O’Brien. “But while infrastructure is scaling fast, networking teams are staying the same size or shrinking. Something has to fill that gap.”
Using AI as a powerful tool, not as an operator
Cause Analysis, launched as part of Kentik’s network observability platform, is designed to do just that. The feature uses AI to analyze traffic data, detect anomalies, and explain what’s going on in clear, human-readable language. Whether it’s packet loss, unexpected latency, or a traffic spike, Cause Analysis can surface what’s happening and why — instantly.
Traditionally, diagnosing these issues required a network engineer to sift through traffic data, identify capacity bottlenecks, and filter traffic flows to find the source. That’s time-consuming, highly specialized work. But now, said O’Brien, “You click a single button, and it finds and highlights those spikes, tells you what contributed to them, and explains it like a seasoned network engineer would — including ports, protocols, and autonomous systems.”
The heavy lifting is done by AI models trained to query data, interpret results, and summarize findings. Kentik’s approach combines statistical analysis with large language models to generate concise, actionable reports — not just raw data. And the system is designed to be transparent: users can review and tweak the underlying queries if needed.
“The AI isn’t running your network,” O’Brien emphasized. “But it helps teams make sense of complex data quickly. It augments human expertise, it doesn’t replace it.”
That distinction is key for teams who fear black-box automation. In Kentik’s model, the machine does the grunt work, surfacing likely root causes, while the human provides the judgment.
Integration with your stack
The platform is also built for integration. Cause Analysis can be triggered within Kentik’s Data Explorer with a single click, and data is exportable through its Firehose capability. There’s also a growing emphasis on natural language querying — allowing operators to ask questions in plain English and receive structured insights in return.
One standout integration is with ServiceNow, where Kentik can plug into ticket workflows and even communicate via AI agents with other observability tools. That means faster time to resolution for Help Desk teams and tighter feedback loops between infrastructure and operations.
With companies like Dropbox, Zoom, and ServiceNow among Kentik’s customers, the use cases are as diverse as the networks they manage. But the common thread is clear: as infrastructure scales and skilled networking talent becomes scarce, tools like Cause Analysis are essential. They democratize visibility and drastically reduce the time it takes to go from alert to answer.
“We’re not just giving you a graph,” said O’Brien. “We’re telling you what changed, why it changed, and what you can do about it.”
For DevOps and SREs navigating the challenges of modern, distributed systems, Kentik’s AI-powered capabilities aren’t just nice to have — they’re rapidly becoming mission-critical.
Edited Transcript
Swapnil Bhartiya: As networks become more complex and distributed, understanding why things break and how to fix them has become a critical challenge, but network expertise is in short supply, especially among DevOps and SRE teams who are tasked with maintaining uptime. That’s where Kentik’s new Cause Analysis comes in, powered by AI. It is designed to automatically pinpoint the root cause of network issues so teams can resolve incidents faster without needing a CCNA. Joining me today is Chris O’Brien, Sr Director of Product Management at Kentik, to talk about how this innovation is helping every engineer become a network expert. Chris, it’s great to have you on the show. Thanks very much. Of course, we have seen DevOps and SRE roles become mainstream in recent years, but network engineers are still relatively rare. Why do you think that is and what’s driving that shift?
Chris O’Brien: Network engineering was hot for a long time. Networks are still super important—they’re foundational. They’re working, and a lot of the industry is focused more on the forefront, which is things like SRE, DevOps, and application development in the cloud. The fact remains, though, that the network underlies all of those things, and we need that network to work well, at scale, and be performant. This produces a situation where networking teams are staying the same size or getting smaller, but have a growing number of responsibilities and a growing scale of infrastructure. I think it’s really the proliferation of cloud architectures and microservices architectures that are driving a lot of that focus on the app, SRE, and DevOps side.
Swapnil Bhartiya: Kentik’s new Cause Analysis feature promises to identify and explain network issues automatically. Can you walk us through how it works and what role AI plays in the process?
Chris O’Brien: When running a network, one of the most common problems is overloaded capacity, which causes packet loss and slow performance. For users, this means poor application performance and delays or unresponsiveness. What a network engineer has to figure out is what traffic is consuming that limited capacity. We just released a new capability called Cause Analysis that helps you understand the main contributors to an increase, decrease, or spike in traffic. Our customers often get requests from DevOps, SRE, or senior support teams asking if there’s a networking problem and, if so, what’s causing high capacity. Historically, you needed an expert who understands the network and traffic. They would look at overall traffic, see increasing capacity usage, zoom in, and start filtering to understand what’s driving the traffic. This takes a lot of iteration and time, and it requires expertise. All of that causes delays in both the fix and in SRE and DevOps teams understanding what’s going on and if it’s something they can resolve. Cause Analysis identifies those change points in increasing capacity, tells you what is causing them, what the primary contributors are to that traffic change, and explains it in both human language—as if an expert network engineer did that work—and in technical specifics like port, protocol, and autonomous systems.
Swapnil Bhartiya: Can you also emphasize the role of AI? You touched on it, but since everyone talks about AI these days, can you specify that?
Chris O’Brien: AI is pretty important in a lot of the technology we’re building right now. Deciding how to do the queries, how to divide up that traffic, where to look, and then understanding to what degree and what percentage is causing the impact—the increase or decrease—that decision-making process and repeated querying of data is something the AI leads. Once a conclusion is reached, the AI, rather than just delivering technical fields like TCP port, gives you a natural language explanation. That comes from large language models—two or three sentences describing what’s going on.
Swapnil Bhartiya: With AI doing most of the heavy lifting, does that mean teams can operate with less hands-on networking expertise, or do you see it more as an assistant to the experts, meaning you still need networking expertise?
Chris O’Brien: When we talk to our customers, it’s clear the challenge is an increase in demand from all parts of the business to do more on the network—more capabilities and greater scale—but with the same or smaller teams. Something has to make up that gap. That’s where I see AI helping out. AI is not running your network; we still require human experts who know how to run the network. But augmenting with increasingly powerful and intelligent capabilities from your systems and tools is really the only way to get there.
Swapnil Bhartiya: What type of companies are using Kentik today, and what are the biggest network-related challenges they turn to you for help with?
Chris O’Brien: We started about a decade ago with traffic analysis and became a world leader in it. We work with a lot of the largest companies in the world—Dropbox, ServiceNow, Zoom, and many carriers fit into that category.
Swapnil Bhartiya: Network data is often siloed and hard to interpret. How does Cause Analysis integrate into existing observability stacks or workflows?
Chris O’Brien: There are a number of things we do. First, our data is all exportable via a capability we call Firehose, so we can export at scale. You can also query the data via AI. Recently, we’ve introduced a capability where you can query via natural language, so agents can work with each other to come to a conclusion without a fancy, formatted protocol. They can interact with natural language, which I think will be an accelerator. We also announced a partnership with ServiceNow, so folks in Help Desk and other groups using ServiceNow can look at a ticket they’re trying to resolve, call on a ServiceNow agent to help, and that agent can communicate with agents from other vendors with expertise in different areas. We’re providing the network observability piece of that. There are a number of avenues, and we’re thinking about MCP and agent-to-agent methods for the future, but there’s quite a bit we support today.
Swapnil Bhartiya: Latency, packet loss, routing loops—these can be hard to diagnose under pressure. What’s the learning curve like for teams starting with Kentik’s Cause Analysis?
Chris O’Brien: There’s a lot you can learn about Kentik, but Cause Analysis shows up in the product in a couple of ways. The simplest is anytime you’re looking at a chart in our Data Explorer, there’s a button that says “Analyze” with a Cause Analysis option. You click one button, and in that chart, you’ll see it finds and highlights periods of increase or decrease, enumerates them in a list, and gives you a natural language explanation of what’s contributing to that. It’s a really easy on-ramp. A lot of the reason folks use our tools is that they can dig deep and ask any question of the network, but here we’re trying to accelerate that with a one-click workflow.
Swapnil Bhartiya: As AI gets more involved in infrastructure operations, how do you ensure that the insights it surfaces are explainable and trustworthy?
Chris O’Brien: There are two pieces to that. First, we have to recognize AI is not running anyone’s network right now. If you start thinking that way, it’s just too early and will cause problems. We think about augmenting the user and focusing on the best way for those two to interact to get the most out of the machine and the human behind the console. Often, the machine does a lot of the legwork, and the human does a lot of the reasoning. With Cause Analysis, we give you those explanations after multiple queries and assessments of the results, and then we highlight what we think is the right answer. If you want to drill into that, you can look at the exact query that was run, see the results, and adjust or tweak the query to ensure your interpretation matches the computer, but without having to do the 10 or 20 queries yourself. I think that’s super important.
Swapnil Bhartiya: Chris, thank you for joining me and walking us through how AI is helping teams close the network visibility gap. It’s exciting to see tools like Cause Analysis empowering engineers across disciplines, not just networking pros. And for those watching, stay tuned for more conversations on how AI and observability are shaping the future of infrastructure and reliability. Thanks for watching and see you in the next video.
Chris O’Brien: Thank you.





