Enhancing Kubernetes resource management with Node Resource Interface (NRI)

Author: Feruzjon Muyassarov, Cloud Orchestration Software Engineer, Intel
Bio: Feruzjon is a Cloud Orchestration Software Engineer at Intel. Currently his focus area is resource management in Kubernetes.

As the Kubernetes ecosystem has evolved, the need for a more efficient and customizable way to manage resources has become increasingly apparent. The need for plugins to manage resources and manipulate low level OCI container properties for custom setups has been raised by multiple companies. There were several out-of-band solutions that were manipulating container kernel parameters, until containerd maintainers decided to create an interface that is integrated directly into the runtime. This led to the creation of the Node Resource Interface (NRI), an extension mechanism for container runtimes that allows for the integration of custom resource management plugins. The project was initially started by containerd engineers within the containerd project in 2021. As it garnered attention in the open-source community, engineers from companies like Apple, Intel, IBM, Red Hat and Google joined forces to evolve the project into its current revisited state.

This article will delve into the problem of resource management in Kubernetes, why it’s a significant issue, and how the NRI can help to solve it. We’ll explore what the NRI is, how it works, and its practical applications. We’ll also look at how to deploy NRI in your cluster and examine a couple of resource management NRI plugins in detail. Whether you’re a plugin developer or a Kubernetes user looking to optimize resource utilization, this article will provide valuable insights into the benefits and capabilities of the NRI.

In its initial versions, Kubernetes primarily worked with Docker Engine. However, as the ecosystem started to grow, Kubernetes compatibility expanded to incorporate various container runtimes, where Docker was just one of several options. Alternatives such as containerd and CRI-O emerged, prompting the development of the Container Runtime Interface (CRI) to facilitate integration of Kubernetes with other runtimes beyond Docker. Similarly, new interfaces sprang up to extend Kubernetes to support other types of solutions. You’ve likely encountered terms like Container Network Interface (CNI) and Container Storage Interface (CSI) in Kubernetes. Simply put, these interfaces enable Kubernetes to use a wide variety of networks, storage solutions, and container runtimes, without jamming those solutions straight into Kubernetes’ code base. These interfaces aren’t superstars on their own – they’re more like extension points, letting you pick your favorite storage, network, or runtime solution and hook it up to Kubernetes. Remember how complicated and cumbersome it was to maintain those solutions in Kubernetes before different plugin mechanisms/interfaces were introduced and code split out?

What is Node Resource Interface (NRI)?

NRI operates much like other extension interfaces in Kubernetes and is built directly into runtimes, and it operates the same across both containerd and CRI-O-based clusters.

Essentially it is a framework that allows you to plug in custom logic for adjusting various container parameters in your workloads. One thing NRI enables is to plug in custom resource management algorithms to Kubernetes through the runtimes. All plugin developers must adhere to NRI specifications to ensure compatibility. You might be wondering, “why bother with resource management plugins when Kubernetes already handles it?” Well, here’s the scoop: Kubernetes tends to oversimplify node hardware resources, especially in cloud environments where hardware control is limited. However, if you’re in control of the hardware, implementing custom resource allocation algorithms (commonly known as plugins) that possess a deeper understanding of the hardware’s architecture and, more crucially, can allocate container resources with greater efficiency, can result in lower latencies, higher throughput, better isolation with the same set of resources.

What are the practical applications of NRI and its plugins?

NRI alone doesn’t directly impact containers, so only enabling it within the runtime will not have any effect on applications. However, NRI plugins can adjust containers at various lifecycle events. This flexibility to adjust container spec opens a realm of possibilities. For instance, resource management NRI plugins with a deeper understanding of node hardware architecture can execute tasks like restricting to subset of CPU cores to run on and restricting memory regions to allocate memory from, workload grouping to minimize interference (often referred to as the noisy neighbor problem), and optimization of latency/bandwidth between CPUs and accelerator devices (commonly known as the Non-Uniform Memory Access (NUMA) problem).

NRI project owners have developed a few reference plugins focused on more advanced resource management algorithms compared to existing in kubelet, as well to other research and experimental works in the areas of granular memory and swap management, setting limits for number of processes or open files in containers and more. To facilitate community collaboration and the sharing of these plugins, they’ve established a GitHub project called nri-plugins, serving as a repository for community-maintained plugins. If you’re keen on optimizing resource utilization in your cluster, chances are these existing plugins might meet your needs already. However, you’re not restricted to using only these plugins; you can also develop your own custom NRI plugin which will work on all NRI-enabled container runtimes in the same way.

How to deploy NRI plugins in your cluster?

Though presently disabled by default, NRI is integrated into container runtimes and is expected to become enabled by default soon. To utilize NRI, you must explicitly enable it within the runtime. Now, you might be wondering about the mechanism for deploying those plugins. The answer lies in the fact that all community-maintained plugins operate as Kubernetes DaemonSets. Despite running as Kubernetes applications, themselves, these plugins possess the capabilities to modify node-level configurations. The following diagram illustrates the involvement of NRI and its plugins throughout the lifecycle of a pod/container.

Pod creation request flow when using when NRI is enabled & NRI plugin is in use.

Exploring NRI plugins

As mentioned above, NRI framework is an integrator of custom resource management algorithms (a.k.a. plugins) into the Kubernetes. Without those plugins Kubernetes resource management logic will remain the same, even if NRI is enabled in the runtime. NRI doesn’t dictate what can be achieved with custom plugins and as the community around it has started developing various NRI plugins to address some of the hot issues like resource management. Next, we’ll explore what are some of the existing NRI plugins and what problems do they solve.

Let’s delve a bit deeper into a couple of resource management NRI plugins, namely Topology-Aware and Balloons. Topology-Aware is specifically designed for resource management and offers zero-configuration setup. When activated, this plugin identifies the hardware topology, constructs an architecture tree, and then assigns or reassigns all containers on that node to CPUs in a way that optimizes the locality of memory, storage, and other devices.

Allocation of CPUs close to the relevant devices helps effectively isolate workloads from one another and mitigate interference (commonly known as the noisy neighbor problem) on shared resources. As modern CPU architectures evolve, achieving optimal resource isolation through intricate internal topologies is becoming increasingly complex, necessitating a more detailed and granular approach to resource allocation. With evolution of architecture of all modern CPUs, the internal topology and optimal resource isolation is getting more and more complicated and require more granular resource split. The illustration below showcases the process by which the Topology-Aware plugin constructs CPU pools, starting from the smallest region (CPU cores closest to a memory controller) and expanding to encompass the entire system’s CPUs based on detailed topology information that Linux kernel provides.Topology-Aware plugin tries to allocate container CPU resources, starting from the smallest region (CPU cores closest to a memory controller) and expanding to encompass the entire system’s CPUs.

Balloons, another resource management policy plugin, offers users an extensive set of configuration options for resource allocation. Those tuning configurations allow using Balloons plugin even in a very complex and large setups. For instance, as a user of the Balloons plugin, you can mitigate interference between different classes of workloads (such as various applications) by grouping them together, as shown in Picture 3. Furthermore, you have the flexibility to define detailed properties of each balloon individually. For example:

size of the balloon (i.e., CPU pool) (see the Picture 4) where containers can run.
specify whether a set of CPUs should span across multi-socket systems to optimize performance or not.
Set minimum and maximum frequencies on CPU cores and un-core.
Permit or forbid the pool to dynamically increase and more.

Assigning CPU frequencies for different classes of workloads. Each balloon or pool of CPUs has its specific CPU frequency that is best for the application running on those CPUs.

CPU resources partitioning for different classes of workloads.

Conclusion

The evolution of Kubernetes has highlighted the need for efficient resource management solutions. Recognizing this, NRI was developed as an extension mechanism within container runtimes. NRI allows for the integration of custom logic, enabling adjustment of various container parameters in your workloads to achieve resource utilization efficiency. Thanks to NRI, developers can create plugins tailored to their specific hardware architecture and workload requirements.

Get Involved

If you’re interested in participating in the NRI project or contributing to plugin development, please refer to the respective contribution guides provided in each document.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon Europe in Paris from March 19-22.