AI Infrastructure

Why Production Operations—Not Coding—Is the Real Killer App for AI Agents

Randy Bias argues production operations is massively underserved by AI agents. Learn why Mirantis is using Claude and MCP for automated Kubernetes troubleshooting instead of just coding assistance.

By Monika Chauhan 2 days ago

0

Guest: Randy Bias (LinkedIn)
Company: Mirantis
Show Name: The Agentic Enterprise
Topic: AI Infrastructure

The AI agent gold rush is on, but everyone’s digging in the wrong place. While enterprises pile resources into AI-assisted coding tools, their production systems are crying out for help. Randy Bias, VP of Strategy & Technology at Mirantis, is calling out the industry’s blind spot: production operations is massively underserved when it comes to agentic AI—and that’s where the real value lies.

The Coding Obsession Is Masking a Bigger Problem

Walk into any enterprise tech conversation about AI agents and you’ll hear about GitHub Copilot, coding assistants, and developer productivity gains. But Bias has a provocative question: what happens after the code is written?

“People are focused on AI coding, but the reality is there are far more problems with production,” Bias explains in a recent TFiR interview. “What do you do once your code is actually running in production? How do you make sure it continues to run? How do you deal with problems and failures?”

This isn’t just philosophical musing. Mirantis is seeing active customer conversations about deploying AI agents and Model Context Protocol (MCP) servers specifically for troubleshooting production problems in Kubernetes clusters. The company is demonstrating proof-of-concept work that uses Claude with MCP servers to enable automated triage of production issues in real-time.

Why Operations Teams Are Drowning

The math is simple but often ignored: code spends a tiny fraction of its lifecycle being written and the vast majority running in production. Yet the AI tooling investment ratios are inverted, heavily favoring the development phase over operations.

Modern production environments present challenges that are perfect for AI agents. Kubernetes clusters generate massive amounts of telemetry data. Microservices architectures create complex dependency chains where failures cascade unpredictably. On-call engineers face alert fatigue while trying to correlate signals across multiple monitoring systems.

“The area of operations is seriously underserved and underutilized for agents,” Bias notes. This represents a fundamental mismatch between where the problems exist and where the AI solutions are being deployed.

What Automated Production Triage Actually Looks Like

Mirantis is publishing proof-of-concept demonstrations on their T Zero (t0) blog showing how this works in practice. The approach combines Claude’s reasoning capabilities with MCP servers that can introspect running Kubernetes systems in real-time.

When production issues occur—a pod crash loop, performance degradation, failed deployment—AI agents can automatically investigate using the same tools and data sources that human operators use. They can query logs, check resource utilization, examine recent configuration changes, and correlate events across the cluster.

The key difference from traditional monitoring and alerting is the agent’s ability to reason about what it finds. Rather than just triggering alerts based on predefined thresholds, agents can follow investigative workflows, form hypotheses about root causes, and even propose remediation steps.

This is fundamentally different from AI-assisted coding, where a developer initiates the interaction. In production operations, the system itself drives agent behavior through events—outages, anomalies, deployment triggers. The agents operate autonomously in response to what’s happening in the production environment.

The Broader Pattern Emerging

Bias’s argument reflects a broader realization about where AI agents deliver the most value: in domains with high operational complexity, large volumes of real-time data, and urgent time-to-resolution requirements.

Production operations checks all these boxes. SRE teams are already comfortable with automation and infrastructure-as-code practices, making them natural early adopters of agentic AI. The problems they face—incident response, capacity planning, performance optimization—involve exactly the kind of data analysis and pattern recognition that large language models excel at.

Perhaps most importantly, the return on investment is immediate and measurable. When an AI agent helps reduce mean-time-to-resolution for production incidents, organizations can quantify the business impact in terms of reduced downtime and operational efficiency.

What Enterprises Should Do Now

For organizations exploring agentic AI, Bias’s message is clear: don’t ignore operations. While coding assistants deliver productivity gains, production operations represents a potentially larger opportunity with more immediate business impact.

Mirantis is betting on this thesis with their MCP AdaptiveOps framework, which includes services specifically designed to help enterprises deploy AI agents for operational use cases. The approach focuses on using open standards like MCP and general-purpose agents rather than building custom solutions from scratch.

The proof-of-concept work Mirantis is publishing demonstrates that the technology is ready. What’s needed now is for enterprises to shift their thinking about where AI agents should be deployed. The exciting innovations aren’t just happening in IDEs and development environments—they’re happening in production, where the stakes are highest and the problems are most urgent.

The Underserved Opportunity

As the agentic AI wave accelerates, production operations stands out as an area where demand far exceeds current solutions. Development teams have their coding assistants. Now it’s time for operations teams to get the AI support they desperately need.

The systems are already running. The problems are already happening. The data is already being generated. What’s been missing are AI agents smart enough to make sense of it all—and bold enough to act autonomously when production problems strike.

Bias and Mirantis are making the case that this is where enterprises should focus their agentic AI efforts. The question isn’t whether AI agents will transform production operations. It’s whether your organization will lead that transformation or scramble to catch up.

You may also like

AI Infrastructure Reality: Why Enterprise Projects Will Struggle in 2026 | Rob Hirschfeld, RackN | TFiR

By Monika Chauhan1 day ago

AI Infrastructure

Why Cloud Redundancy Isn’t Enough: The Application-Level HA Gap | Matthew Pollard, SIOS

By Monika Chauhan3 days ago

Cloud Native

Why Observability and Multi-Cloud HA Are Essential for 2026 | Margaret Hoagland, SIOS

By Monika Chauhan3 days ago

Cloud Native

Datadog’s Autonomous SRE Agents Cut Incident Response Time From Hours to Minutes

By Monika Chauhan4 days ago

Observability

SIOS LifeKeeper v10 Eliminates Multi-Dashboard HA Management | Margaret Hoagland

By Monika Chauhan4 days ago

Cloud Native

How Akamai & Fermyon’s Shared Vision Is Reshaping Edge-Native Development | Ari Weil, Matt Butcher

By Monika ChauhanJanuary 1, 2026

Cloud Native