Guest: Saiyam Pathak (LinkedIn)
Company: vCluster Labs
Show Name: KubeStruck
Topic: Kubernetes, Cloud Native
GPU utilization is the silent budget killer in AI infrastructure. Teams spend millions on NVIDIA hardware, only to watch GPUs sit idle because traditional Kubernetes setups can’t efficiently share resources across teams—or they sacrifice security for utilization. Saiyam Pathak, Head of Developer Relations at vCluster, cuts through this dilemma with a solution that addresses both problems: virtual clusters with private nodes and intelligent autoscaling.
The GPU Utilization Problem Nobody Talks About
In bare metal environments, the math is brutal. You have limited physical nodes—some CPU, some GPU. Each team wants its own Kubernetes cluster for isolation, but the minimum requirement is three to four nodes per cluster.
“Bare metal capacity is limited, and you cannot create that many Kubernetes clusters,” Pathak explains.
The traditional approach forces an impossible choice: either overprovision clusters and waste expensive GPU resources, or under-isolate workloads and create security nightmares. Neither option works when GPU costs run into millions of dollars.
vCluster’s Multi-Tenancy Approach
vCluster’s solution flips the model entirely. Instead of spinning up multiple physical Kubernetes clusters, teams create one unified cluster from all bare metal hardware—combining CPU and GPU nodes—and then provision virtual clusters for each team.
“What we want to do is capture the entire multi-tenancy spectrum,” Pathak says.
The 2025 releases achieved exactly that. Previously, vCluster supported only shared nodes. Now it spans the full tenancy spectrum: shared nodes for basic isolation, private nodes for maximum security, and hosted control planes with physical node joins for teams that need both.
The Private Node Breakthrough
Private nodes represent a fundamental shift. The control plane pod runs on the host cluster, but teams can join physical nodes outside the host cluster to their virtual clusters.
“This brings complete isolation,” Pathak notes. “For those who want maximum security and a newer layer in the tenancy spectrum.”
Even more innovative, vCluster integrates Karpenter for bare metal environments. “vCluster is the only solution out there that provides Karpenter for bare metal or any Kubernetes cluster,” Pathak says. This enables intelligent autoscaling that frees up GPU nodes when they’re not being used and provisions them when workloads demand it.
The NVIDIA DGX Partnership
The partnership with NVIDIA DGX addresses a deeper challenge: cloud GPUs and bare metal GPUs aren’t identical. The architectures differ in networking layers and operational characteristics.
“Setting up NVIDIA drivers and making sure the GPUs are available to be consumed by pods is also challenging,” Pathak explains.
Beyond driver complexity, standard Kubernetes schedulers fall short for AI workloads. Batch workloads and gang scheduling require specialized schedulers like Kueue or Run:AI (which NVIDIA has acquired). vCluster’s integration ensures these schedulers work natively within the virtual cluster environment.
“It’s not just a Kubernetes cluster—it’s a Kubernetes cluster that can actually run and scale your AI workloads,” Pathak emphasizes.
The building blocks—vCluster, DGX hardware, specialized schedulers, and Karpenter integration—come together as a complete platform rather than a Frankenstein assembly of disconnected tools.
Why This Matters for AI Infrastructure
The implications extend beyond cost savings. Maximum GPU utilization means faster experimentation cycles for data science teams. Strong isolation means security teams can approve AI workloads without lengthy review processes. Autoscaling means infrastructure teams aren’t constantly firefighting resource constraints.
For organizations running AI workloads on bare metal—whether for data sovereignty, performance, or cost reasons—vCluster’s approach represents a practical path forward. It solves the utilization-versus-isolation paradox that has plagued Kubernetes-based AI platforms since teams began running production ML workloads.





