Guest: Simone Morellato (LinkedIn)
Company: vCluster Labs
Show Name: An Eye on AI
Topic: Kubernetes
GPU resources are the bottleneck in AI development, but the real crisis isn’t availability—it’s utilization. Most organizations waste 80% of their GPU capacity because Kubernetes workloads lock entire GPUs even when idle. Simone Morellato, Customer Success Lead at vCluster, reveals how multi-tenancy at the Kubernetes layer solves this expensive problem.
📹 Going on record for 2026? We're recording the TFiR Prediction Series through mid-February. If you have a bold take on where AI Infrastructure, Cloud Native, or Enterprise IT is heading—we want to hear it. [Reserve your slot
The GPU sharing challenge is simple but costly. When a pod attaches to a GPU on platforms like Nvidia DGX, that entire GPU—including gigabytes of memory—becomes locked to that single workload. Even if the workload sits idle, the GPU remains unavailable to other teams. The result? Organizations running at 20% utilization on million-dollar systems.
“Once a pod workload gets attached to a GPU, that GPU and all the memory gets assigned to that workload,” Morellato explains. “There is no way to share it. Even if that workload is not doing anything, it just locks this GPU until it’s done.”
This becomes particularly problematic for organizations that can only acquire limited GPU systems. Nvidia ships approximately 40 DGX platforms per year, meaning only 40 entities worldwide get access. When these platforms arrive, organizations desperately need to share them across multiple teams—but traditional Kubernetes setups make this nearly impossible.
vCluster’s approach focuses on multi-tenancy at the Kubernetes layer, enabling flexible GPU sharing across teams while maintaining proper isolation. The platform allows organizations to move GPUs dynamically between teams based on actual usage, jumping from 20% to 80% utilization.
Morellato uses a striking analogy: “You get a Ferrari, and usually a Ferrari is one seat. That’s kind of what DGX is like—a big system, big expense, super powerful, but one seat. Imagine vCluster is like adding 100 seats to that Ferrari so everybody can go faster.”
The innovation extends beyond simple sharing. vCluster offers a spectrum of isolation levels depending on use case. Teams within the same organization might only need control plane isolation, while service providers sharing resources between different companies require full isolation across networking, storage, policies, and users.
This flexibility proves crucial for service providers building GPU-as-a-Service platforms. They purchase expensive DGX systems and must share them across multiple customers, each requiring complete segregation from others. vCluster’s configurable isolation levels let providers choose the right security posture for each scenario.
The summer innovation sprint that produced these capabilities was driven by real-world pressure. Organizations weren’t just requesting better GPU sharing—they needed their million-dollar investments to behave as dynamically as cloud resources, where users click a button and receive a cluster within minutes.
For AI infrastructure teams, the implications are significant. Instead of one team monopolizing a DGX system, 100 teams can access the same resources. Instead of waiting months for GPU allocation, teams can scale up and down based on actual workload needs. The technology transforms expensive, rigid hardware into flexible, cloud-like infrastructure.
As GPU scarcity continues driving AI development costs, technologies that maximize utilization become strategic advantages. vCluster’s multi-tenancy approach doesn’t solve the GPU shortage, but it does ensure organizations extract maximum value from the GPUs they can acquire.





