AI Writes Code, But Who’s Managing the Infrastructure? GitOps Has the Answer | Hong Wang, Akuity

AI writes code 3x faster, but GitOps infrastructure-as-code makes AI infrastructure management a natural fit. Hong Wang explains AI-powered SRE runbooks.

By Monika Chauhan 2 hours ago

0

AI-generated code is flooding production environments at 3x the historical rate, but GitOps infrastructure-as-code patterns stored in Git repositories create a natural pathway for AI to manage deployments autonomously. If configuration is already version-controlled as YAML, AI can modify it directly—and Argo CD syncs it to clusters automatically. The question isn’t whether AI can manage infrastructure; it’s how platform teams define the guardrails.

The Guest: Hong Wang, Co-founder and CEO at Akuity

The Bottom Line

GitOps infrastructure-as-code makes AI infrastructure management a natural fit—configuration stored in Git can be modified by AI and synced automatically by Argo CD, with human-defined runbooks ensuring deterministic results

***

Speaking with TFiR, Hong Wang of Akuity defined the current state of AI integration into GitOps workflows and explained how AI-powered SRE capabilities are shifting platform engineering responsibilities from manual troubleshooting to defining automation guardrails.

What Is the Relationship Between AI and GitOps?

Wang drew a parallel between Kubernetes and Linux: just as Linux became the operating system for individual machines, Kubernetes has become the operating system for clusters. GitOps—treating infrastructure configuration as version-controlled code stored in Git—provides the interface layer that AI can naturally interact with.

Hong Wang: “What is AI really good at right now? Writing code. What is code? Code is GitOps—our YAML, our Kubernetes configuration. By adopting the GitOps concept, which is very common now, it’s already in the repository, already in code. That means it’s providing a very solid and direct way for AI to start managing the infrastructure or managing the application lifecycle. You just change the YAML in Git, and Argo CD will do the job to sync it and help you deploy it properly with automation. That’s already a validated pattern. It’s all, in fact, a plug-in for AI.”

The infrastructure-as-code approach removes the need for AI to learn proprietary APIs, CLI tools, or UI workflows. Instead, AI modifies YAML files in Git repositories—the same interface human operators use—and GitOps controllers like Argo CD handle deployment reconciliation automatically. This creates a clean separation: AI focuses on decision-making (what should change), while Argo CD ensures the desired state matches reality (how to apply the change).

Broader Context: AI-Powered SRE Capabilities

Wang explained that platform engineering and SRE teams spend the majority of their time troubleshooting production issues by analyzing logs, events, and Kubernetes manifests. AI changes this equation by digesting vast amounts of operational data and identifying root causes autonomously—but only when guided by human-defined runbooks.

Hong Wang: “We’re talking about Kubernetes users—platform teams, SRE teams. Their day-to-day life is running your application and babysitting your application, ensuring everything is running fine. What is my job if I’m an SRE? Every day I’m staring at logs, events, manifests. I try to figure out when something’s broken—what is the root cause, how do I fix it. With AI’s help, we actually built something to solve that demand. In Argo CD and Kargo, we added AI runbook and AI SRE capability. Rather than the human zooming in on all the information, we let the AI digest all the information, get the conclusion, and also factor in input from the user as the runbook.”

The runbook concept is critical: instead of allowing AI to improvise solutions, platform teams define symptom-solution patterns in human-readable format. For example: “Symptom: image pull backoff error from Docker registry A. Solution: override registry A to registry B.” When Argo CD detects the symptom, AI applies the human-certified solution automatically.

Wang noted that Akuity supports two operational modes: full automation (AI takes action autonomously) and human-in-the-loop (AI surfaces the recommended action, waits for approval). The choice depends on risk tolerance and operational maturity. High-confidence, well-documented issues can run autonomously; novel or high-stakes scenarios require human judgment.

The broader trend Wang identified: AI is fundamentally shifting SRE responsibilities from reactive troubleshooting to proactive guardrail definition. Instead of manually fixing the same issues repeatedly, platform teams define runbooks once, and AI handles subsequent occurrences automatically. This mirrors the Linux-Kubernetes parallel: just as Linux abstracted hardware differences, AI is abstracting operational toil—freeing SRE teams to focus on architecture, reliability patterns, and system design rather than log archaeology.

Watch the full TFiR interview with Hong Wang here.

Read Full Transcript & Technical Deep Dive

You may also like

Zero-Day Threats Don’t Wait for Antivirus: AI Predicts Malware Before Execution | Dr. Aqib Rashid, Glasswall | TFiR

By Monika Chauhan2 hours ago

Security

Multi-Cloud Fragmentation Is Creating Governance Blind Spots | | Dirk Alshuth, emma | TFiR

By Monika Chauhan5 hours ago

Cloud Native

Near-Zero Downtime Patching With HA Clustering: Dave Bermingham, SIOS Technology | TFiR

By Monika Chauhan1 day ago

AI Infrastructure

AI Infrastructure Complexity Is Costing Enterprises Millions—Mirantis Has a Fix | TFiR

By Monika Chauhan1 day ago

AI Infrastructure

Why AI Agents Fail on Real Business Data | Michel Tricot, Airbyte | TFiR

By Monika Chauhan1 day ago

AI Infrastructure

Why anynines Rewrote Stratos UI from Scratch: CF AppStage on Cloud Controller v3 | Julian Fischer

By Monika Chauhan1 day ago

Cloud Native