Kubernetes has become the de facto orchestration layer for cloud-native infrastructure, but AI and machine learning workloads expose a fundamental limitation: static GPU allocation. Traditional device plugins lock GPUs into rigid configurations—passthrough, vGPU, or MIG—requiring manual intervention to switch modes. For enterprises scaling AI infrastructure, this kills velocity and wastes expensive hardware.
Dynamic resource allocation (DRA) changes the game. By treating GPU configurations as flexible, on-demand resources rather than static assignments, DRA enables Kubernetes to dynamically provision GPUs based on workload requirements. NVIDIA’s decision to donate its DRA driver to the open source community signals a critical shift in how GPU clouds will be architected.
The Guest: Ryan Hallisey, Maintainer, KubeVirt
Key Takeaways
- Dynamic resource allocation (DRA) replaces static device plugins, enabling flexible GPU provisioning for AI/ML workloads
- NVIDIA donated its DRA driver to the KubeVirt community to accelerate open source development and adoption
- KubeVirt is expanding beyond KVM to support Hyper-V and cloud hypervisor, positioning itself as a universal virtualization API layer
- NUMA topology awareness is critical for performance-sensitive AI workloads in virtualized Kubernetes environments
- Multi-tenant GPU clouds commonly use KubeVirt as a tenancy layer, running nested Kubernetes clusters inside VMs
***
In a recent TFiR interview, Swapnil Bhartiya spoke with Ryan Hallisey, Maintainer, KubeVirt, about the evolution of GPU allocation in Kubernetes, NVIDIA’s DRA driver donation to the open source community, and KubeVirt’s path to CNCF graduation.
What Is KubeVirt?
KubeVirt extends Kubernetes to support virtualized workloads, enabling enterprises to run VMs and containers on a unified control plane. Hallisey described KubeVirt as a virtualization API layer built on Kubernetes’ extensibility.
Q: What is KubeVirt and what role is it playing in modern infrastructure?
Ryan Hallisey: “What’s great about Kubernetes is that it’s extensible. It allows us to build a lot—it’s a platform we can build many things on top of. KubeVirt came along and took this idea of virtualization, which has been around for a long time, and thought, ‘We can actually run virtualization on top of this platform—on top of Kubernetes.’ Essentially, KubeVirt is an add-on—a virtualization layer that integrates with Kubernetes. You can run your traditional virtualized applications on Kubernetes. It’s a way to have one shared control plane for both your containers and your VMs.”
Dynamic Resource Allocation (DRA) vs. Device Plugins
Hallisey explained how DRA addresses the rigidity of traditional Kubernetes device plugins, which statically allocate GPUs and other accelerators. DRA enables dynamic, workload-driven provisioning.
Q: What exactly is DRA, and what driver are we talking about?
Ryan Hallisey: “This is NVIDIA’s GPU DRA driver. DRA is a way we can allocate devices in Kubernetes—a way to surface devices to the scheduler and say, ‘Hey, these devices are available, and we want to allocate them to a pod.’ From further context, this is built on top of the device plugin framework, which has been around for a long time. The aspect that’s different is that the device plugin framework was static. You would allocate a device for one type of allocation—like passthrough—and it would stay in that state until you had to go in and manually fix it. One of the big advancements with DRA is that we don’t have to do this. We can look at these devices and say, ‘How do we need to change this device? Do we need passthrough? Do we want to use it as a vGPU? Do we want a MIG device?’ All these different configurations can be defined as patterns in the recipes we request. This allows for much more versatile AI and machine learning workloads.”
NVIDIA’s DRA Driver Donation to the Community
NVIDIA’s decision to contribute its DRA driver to the open source community reflects a broader strategy to accelerate GPU adoption in cloud-native environments. Hallisey emphasized the importance of open governance and collaborative development.
Q: What does it mean for NVIDIA to contribute the DRA driver to the foundation?
Ryan Hallisey: “It’s been a very exciting announcement from NVIDIA to donate the DRA driver to the community. It’s something that has been a labor of love for many people. What it means for the community is that this driver will be used by many people. NVIDIA doesn’t need to be the only maintainer. We don’t need to be the only ones gating changes and fixing issues. Everyone should be able to use it, develop it, and contribute to it, and it should have its own thriving open-source community using these GPUs and developing the driver.”
KubeVirt’s Expansion Beyond KVM
While KubeVirt initially focused on libvirt, QEMU, and KVM, the project is evolving to support multiple hypervisors, including Microsoft Hyper-V and cloud hypervisor. This positions KubeVirt as a universal virtualization abstraction layer.
Q: Does supporting hypervisors beyond KVM mean KubeVirt is aiming to become the universal standard for virtualized workloads?
Ryan Hallisey: “When the project was created, the scope was meant to focus on libvirt, QEMU, and KVM—getting it to a point where we could actually run virtualization using those open-source libraries on Kubernetes. There are use cases where people want to use different hypervisors. We’ve talked with Microsoft about Hyper-V support and have even had conversations with the community about Cloud Hypervisor. The way I would explain where it’s going is that KubeVirt is an API layer—a virtualization API layer—that can be generalized to support other types of hypervisors. We’re thinking a lot about extending it beyond just being a tool that runs KVM VMs to actually supporting other hypervisors—kind of like how libvirt became the library for managing virtual machines and bringing all these pieces together. This could be a space that KubeVirt occupies in the Kubernetes ecosystem. We say Kubernetes is like Linux. KubeVirt could eventually be like libvirt.”
NUMA Topology Awareness for AI Workloads
Performance-sensitive AI and machine learning workloads require precise alignment of compute, memory, and GPU resources. NUMA topology awareness ensures optimal device placement in virtualized Kubernetes environments.
Q: Can you talk about NUMA topology awareness?
Ryan Hallisey: “With many upcoming devices being released, there are going to be some challenges. This is a really important performance feature that we need to address. We need to ensure NUMA alignment of our devices if they are to perform at the level required for these sensitive AI workloads. You would want to do this anyway, but it is extremely important if you’re going to have any adoption at all. So, we need to be able to support NUMA alignment and enable topology awareness. This is something that is being worked on—it’s a work in progress, and we expect it to be delivered in the next KubeVirt release.”
Multi-Tenant GPU Cloud Architectures
KubeVirt enables multi-tenant GPU clouds by providing a virtualization layer that isolates workloads while maintaining Kubernetes orchestration. Hallisey described common deployment patterns for public and private GPU cloud providers.
Q: What is the role of KubeVirt in AI-driven workloads?
Ryan Hallisey: “It depends on your use case. Most commonly, what I see is that when people are building their clouds—if you’re looking to build a GPU cloud, whether public or private—they use KubeVirt to help with their tenancy layer. They’ll create a Kubernetes layer, run KubeVirt, and then give their tenants VMs. Inside those VMs, they’ll run Kubernetes within the virtualization layer, with pods using those GPUs. That’s a common use case for tenancy. Another use case is that some people have a serverless model and need a kernel to wrap around their workload. You can run a KubeVirt VM, where the workload runs inside the VM. It could be a traditional application, or it may be for security reasons that you want to use a VM.”
KubeVirt’s Path to CNCF Graduation
KubeVirt has aligned its release cycle with Kubernetes, reached v1, and achieved wide production adoption. The project has applied for CNCF graduation and is expected to graduate within the next one or two KubeCons.
Q: How far are you from graduation, and what maturity level do you want to achieve?
Ryan Hallisey: “We’ve been thinking about getting to graduation for a while. We’ve identified the requirements and have been working on them for about a year. KubeVirt has aligned its releases with Kubernetes, reached v1, achieved wide adoption, and is stable. We have extensive testing in place, and many organizations are using it in production. I think it’s just a matter of time at this point. We’ve already applied for graduation through the CNCF, and I believe we’re first or second in line. I think it’s coming soon—I don’t know the exact date, but it’s something we expect to happen within the next one or two KubeCons.”
Hallisey emphasized the project’s focus on stability, scaling, and performance measurement, noting that KubeVirt publishes expected scale and performance metrics with every release to help enterprises plan upgrades.
Ryan Hallisey: “Many of the areas we’re excited to expand into include scaling and performance. We take measurements every release, where we say, ‘Here’s the expected scale and performance per release.’ This is extremely valuable if you’re upgrading KubeVirt. These are the kinds of things Kubernetes does—we do them as well. We’ve been really focused on stability and ensuring that people using this in production continue to get the level of stability they’ve already had.”





