Cloud Native AI Infrastructure

Why AI-Generated Code Needs a Cloud Sandbox to Be Trustworthy | Waldemar Hummer, LocalStack | TFiR

AI agents generate cloud code faster than teams can review it. Waldemar Hummer, CTO at LocalStack, explains sandboxed validation and spec-driven development.

By Monika Chauhan 1 day ago

0

Agentic development tools can generate and deploy infrastructure code faster than any team can manually review it. Static analysis catches known misconfigurations, but runtime behavior in a real cloud environment exposes the rest. Teams that skip a sandboxed validation layer are shipping trust assumptions, not tested code.

In this interview on TFiR, Waldemar Hummer, Co-founder and CTO at LocalStack, walks through why the shift to agentic development makes local cloud sandboxing a quality baseline, how spec-driven development reduces agentic drift, and what LocalStack’s Cloud Pods feature enables for CI reproducibility and team collaboration.

Guest: Waldemar Hummer, Co-founder and CTO at LocalStack
Show: TFiR

Here is what every platform engineer and developer building with AI agents needs to know.

Technical Deep Dive

Q: Is AI-generated code quality a model problem or a workflow problem?

Waldemar Hummer, Co-founder and CTO at LocalStack, argues the problem is primarily a workflow problem. Any model in use today is the worst version that will ever exist, meaning capability only improves. The real issue is that engineering processes, including code review, feature planning, and open source contribution models, were all designed around human contributors and have not yet adapted to agentic realities.

“Any model that we see is the worst version of a model that we see from now and into the future.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: What engineering processes need to change most urgently because of agentic AI development?

Hummer identifies code review, feature planning, and prioritization as processes that were built for human contributors and now need rapid adaptation. The shift requires much more diligence around trusting AI-generated code, enforcing strict quality gates, and maintaining a tightly knit testing and quality assurance loop. He frames this as one of the largest single drivers of change in the software industry.

“We need to fundamentally rethink the way how software is being built, because a lot of processes have been built around humans.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: Why do AI agents produce sloppy or unreliable code when given broad prompts?

Hummer explains that asking an agent to handle an entire architecture at once is equivalent to asking a junior developer to design the whole system in one pass. Without incremental constraints and guardrails at each step, agents have too much room to make consequential decisions incorrectly. The solution is to scope each task tightly so the agent has less surface area within which to go wrong.

“To burn the ocean, basically.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: What is spec-driven development and why does it reduce agentic drift?

Spec-driven development means expressing requirements precisely in human language before handing a task to an agent, rather than letting the agent interpret a broad goal. Hummer notes this constrains the agent’s decision space so there is less room to produce unintended behavior. The practice shifts the developer’s role toward specifying the what precisely, while the agent determines the how.

“You need to be very diligent in terms of asking the what and then how it’s going to be figured out by the agent.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: How does LocalStack function as a sandbox for AI-generated cloud code?

LocalStack provides a local container-based emulation of cloud services that agents and developers can deploy against without touching a real cloud environment. Hummer describes it as a sandbox where developers can run runtime checks after deployment, verifying whether ports that should be closed are open or whether security misconfigurations exist in the deployment. This is a fundamentally different validation approach from static analysis of infrastructure-as-code files.

“LocalStack is the ultimate sandbox, the cloud sandbox of monsoon.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: What are the limitations of static analysis for infrastructure-as-code validation?

Hummer notes that static analysis of a Terraform script can flag known misconfiguration patterns, but it struggles to surface the full range of runtime problems that only appear once infrastructure is actually deployed. Misconfigurations that do not match a known rule pattern can go undetected. Runtime validation against a local sandbox surfaces behavior that static tools cannot predict from the script alone.

“It can be hard to spot the exact misconfiguration there, so you need some static code analysis tools or something like that.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: What are Cloud Pods in LocalStack and how do they improve CI reproducibility?

Cloud Pods are persistent snapshots of a LocalStack container state that can be saved and shared across team members. Hummer compares the capability to taking a full snapshot of an AWS account and moving it elsewhere, something not practically achievable in a real cloud environment. When a test fails in a CI pipeline, engineers can persist the container state at the point of failure, pull down the snapshot locally, and reproduce the exact conditions for debugging.

“You can just persist the state, pull down the snapshot and reproduce everything locally so you have the full insights and debugability.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: How does introducing LocalStack create a cultural shift in how developers approach testing?

Hummer observes that introducing LocalStack into a team’s workflow tends to shift how developers think about testing as a continuous activity rather than a compliance checkbox. It also pushes teams to architect applications with testable units in mind, particularly microservices with clear boundaries between services. The tool creates structural incentives to build for testability from the start rather than retrofitting tests later.

“If we introduce a tool like LocalStack, it results in a cultural shift, how people think about testing, testing early in the process.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Q: Why will precision in prompting and specification become a core developer skill?

Hummer argues that ambiguous inputs produce ambiguous outputs from any model, and as agents take on more execution responsibility, the cost of an imprecise prompt compounds quickly. The developer who can specify requirements with surgical precision constrains the agent’s decision space and consistently gets more reliable output. He frames this precision as a skill that will differentiate effective engineers going forward.

“Being very precise about expressing exactly what you need and want is going to be a superpower in the future.” — Waldemar Hummer, Co-founder and CTO, LocalStack

Resources & Documentation

LocalStack, local cloud emulation platform for developing and testing AWS applications without a live cloud environment
LocalStack Cloud Pods, persistent container snapshots enabling shareable, reproducible cloud state across teams and CI pipelines

***

👇 Click to Read Full Raw Transcript

Swapnil Bhartiya: It is interesting when we hear the word slop a lot, but I mean, we have been completely industry for so long. When the docker came, you know, a lot of criticism was again, container. Oh, containers were always part of the Linux tunnel. There is nothing new happening. Then OpenStack came and then Kubernetes came. So we have seen that thing. But do you also think that this is. The sloppiness is just a phase as technologies are material? Of course, anthropics, you know, MCP has changed. They moved a lot of control to users. So do you think this is just a phase, things will get better or you feel that? I mean, look at EVs, right? When the EVs came, they were like, you cannot barely go 50 miles, now you can go 400 miles. You have to, you know, a sports car with EVs. So. So what are you seeing based on your own experience in the space?

Waldemar Hummer: Yeah, so first of all, I think any model that we see is the worst version of a model that we see from now and into the future. Right. So model is just going to get better from here onwards, right? I think that’s a general trend, we can say. So if we’re already quite excited to see and impressed to see cloud code and some of the other models, then it’s only going to get better from here. Right? So that’s one observation. The second one is that we need to fundamentally rethink the way how software is being built, because a lot of processes have been built around humans. Code review, even things like open source, feature planning, prioritization. All of these basic principles of engineering and product development are now kind of overnight, almost like they need to shift and adapt to this new reality. Which means we need to be much more diligent again on being able to trust the code that’s being generated by agents, have very strict quality gates and make sure that we have a very tightly knit testing quality assurance loop. So in my view, this is maybe one of the biggest driver of change in the industry. And again, it speaks very much into our storyline of test. Frequently test early quality assurance is important and with that comes a certain efficiency that you want to give to these models. They want to have a sandbox that they can test against. And I know the whole industry is talking about sandboxes these days and I think local stack is the ultimate sandbox, if you will. The cloud sandbox of monsoon.

Swapnil Bhartiya: One thing I come from fiction and writing background or filmmaking background, is that if I ask even my junior assistant or writer to write the whole Story, that story is going to be disaster. So we always, you know what, write one scene, one chapter with the AI. Also, you know, when I do something, instead of telling it to do the whole code, you know, it’s like in chunks and stages. Then you can also review easily, then you can control. I think the sloppiness comes in when you ask a junior developer to build

Waldemar Hummer: the whole architecture to burn the ocean, basically.

Swapnil Bhartiya: Exactly.

Waldemar Hummer: Yeah, 100%. So I think there’s this incremental progress in making sure that you have guardrails at every step along the way. A lot of people talk about these days about spec driven development, for example. Right. So it becomes much more a matter of specifying exactly what you want, you know, maybe in human language. And then the model has less wiggle room to do things wrong. Right. So you need to be very diligent in terms of asking the what and then how it’s going to be figured out by the agent, basically.

Swapnil Bhartiya: This may be totally bad analogy. I cook a lot. Indian food. If some you walk into me, hey, make me something spicy. I have no idea. I’ll make something you won’t even tolerate, you know, so you have to tell me, hey, I want something chicken, I want this. Just give me ingredients. Yeah, so that’s, I mean I. Because I’m, I’m biased because I’m a heavy user of AI. I have built my own servers and stuff like that. So I do know its limitation. It’s power. That’s where the sloppiness come from. The AI, it’s come from just the way we had the whole DevOps, you know, DEVSECO. I think same thing is going to. It has to become part of the process.

Waldemar Hummer: Yeah.

Swapnil Bhartiya: You cannot just expect it to, as you say, fill the ocean like that.

Waldemar Hummer: And spicy might mean something very different to you than to me, for example. Right. So I think being very precise about expressing exactly what you need and want is going to be a superpower in the future, I think. Right.

Swapnil Bhartiya: The reason I asked these questions was that sometimes right tools bring the right cultural change. Sometimes right tools like for example, if you’re driving shift versus automatic, you know, your driving style also dictates. So sometimes tools like Local Stack, they bring that cultural change because they force developers to work in that way. So can you talk about how is Local Stack actually becoming a catalyst to force users to actually, instead of looking at best practices as checkmark or compliance as check mark, it becomes part of that practice and process so that the sloppiness is removed because your Tool itself.

Waldemar Hummer: That is a great question. And so I think the first part is that what we see often if we introduce a tool like LocalStack, it results in kind of a cultural shift, how people think about, again, testing, testing early in the process. Also how to architect your application to make, you know, have testable units, for example, microservices with clear boundaries towards like the other services and make sure that you have units that are easily testable with that local sandbox. That’s the first part. The second part is that now with a tool like Local Stack, you can do, let’s say you want to validate or verify your infrastructure as code configuration, right? So previous approaches we’re looking at maybe this is your terraform script and it did some validation based on what is defined in the script. If you had some misconfigurations or something in there, but it can be hard to spot some on the exact misconfiguration there, right? So you need some static code analysis tools or something like that. And with Local Stack, you can just say, I’m deploying my infrastructure against the container, against the sandbox, and then I run some runtime checks whether is there any ports that have been open that shouldn’t be there, Are there any security holes in my deployment? It’s becoming much more of a trial and error situation where you can then verify your deployments in the sandbox as opposed to just doing static analysis. So I think those are some changes that we see it also from a collaboration point of view, we have a feature that we call Cloud Pops. It’s basically taking a persistent snapshot of your container and you can easily share that with your team members. So what that allows you to do is almost the equivalent of taking a full snapshot of an AWS account and then moving it somewhere else, which of course would be difficult to achieve in the real cloud. But you can do this easily with Local Stack, which makes things much more reproducible. If you have a failing test in your CI pipeline, for example, you can just persist the state, pull down the snapshot and reproduce everything locally so you have the full insights and debugability. So I think to your point, it just enables fundamental new ways of working that are just not easily possible with the real cloud environment. And it also forces you to think a bit about how to adjust our dev environment in the best possible way to optimize for these quality outcomes.

You may also like

Why HA Failover Fails: Overlooked Application Dependencies and Untested Runbooks | Matthew Pollard, SIOS Technology | TFiR

By Monika Chauhan1 day ago

Cloud Native

Why AI Inference Costs and Vendor Lock-In Are Now Your Biggest Infrastructure Risk | Swapnil Bhartiya, TFiR

By Monika Chauhan1 day ago

AI Infrastructure

Why Cloud Spend Now Drives Company Valuation | Peter Maloney, Azul | TFiR

By Monika Chauhan2 days ago

Cloud Native

Why Enterprises Should Stop Building AI Infrastructure Themselves | Richard Borenstein, Mirantis | TFiR

By Monika Chauhan3 days ago

AI Infrastructure

How to Govern AI-Generated Infrastructure Code at Scale | John Henry Archer & Jonah Kowall, Spacelift | TFiR

By Monika Chauhan4 days ago

How to Govern AI Agents Without Killing Their Usefulness | Miska Kaipiainen, Mirantis | TFiR

By Monika Chauhan4 days ago

AI Infrastructure