Multi-tenancy In Kubernetes Is Hard

By Contributor May 18, 2022

0

If you’ve administrated or used shared Kubernetes clusters, you probably have some stories and possibly some scars from that process. The initial Kubernetes design was pretty minimal when it came to multi-tenancy (Role-based Access Control wasn’t even GA until Kubernetes 1.8), and while things have improved some over time, there are still many challenges concerning multi-tenancy.

Let’s start off by looking at some of the challenges with operating shared clusters, and later we’ll share some open source tools that can help.

The Multi-tenancy models are both lacking

Typically teams use one of two models when deciding how to share Kubernetes clusters. The first is namespace-based isolation, where teams operate in shared clusters and are restricted to one or more namespaces. The second option is what we’ll refer to as cluster-based isolation, where teams or even individuals have their own clusters that are not shared with other tenants. Both of these approaches have advantages and limitations.

With namespace-based isolation, tenants share clusters, which cuts down on the number of clusters admins have to manage. Tenants are isolated to their namespaces using tools like Role-based Access Control (RBAC) and network policies. For some applications, this approach may work fine, but it falls down when the teams need to be able to manage global resources, which exist outside of a namespace. If your team has global objects like Custom Resource Definitions (CRDs) that accompany its applications, then you need to rely on people with more access in the cluster to manage those.

Using cluster-based isolation has its own pitfalls. While it can avoid some of the headaches of trying to lock down a shared cluster, it can create other headaches. Cluster sprawl is a big problem for many teams. It makes it harder to manage environments and creates additional costs. As the number of clusters increases, it can be harder to keep track of what’s actually being used, which means that resources are wasted, which can even impact our planet.

Choosing between these two models can be like picking your poison. Neither are ideal for many use cases.

Isolation is difficult

If your organization uses cluster-based isolation, you may run into difficulties isolating workloads. Two of the primary tools for isolation are RBAC and network policies.

RBAC in Kubernetes is powerful but can also be complicated to manage. Teams in shared clusters may have many roles and role bindings to manage. It can also be difficult for the cluster administrators to know which permissions specific applications need. Many organizations would like to operate based on the principle of least privilege, but in some cases even the application developers may not know which APIs their apps need access to. This can result in lots of trial and error to create the appropriate permissions.

Network policies share a lot of the same issues as managing RBAC, and the difficulty increases for more complex Kubernetes networking setups.

If hard multi tenancy is required where teams not from the same organization are working alongside each but but users from different companies are operating software in a shared Kubernetes cluster, things get even trickier. For more detailed information on hard vs soft multi tenancy and how to tackle isolation and access control topics in either environment, Daniel Thiry’s “Kubernetes Multi-Tenancy Best Practices Guide” is definitely worth checking out.

Controlling costs and resources is also hard

Managing costs and resources are likely to be significant challenges, whichever multi-tenancy model you adopt. There’s no built-in tooling for managing Kubernetes costs and it’s not something you’ll want to roll yourself. Also, many clusters run in one of the major cloud providers, and there are entire companies built on the fact that cloud provider bills can be inscrutable.

Managing resources is made more complex by the fact that quotas for resources like CPU and memory can only be assigned for individual namespaces. There’s no way to set an overall quota for the resources that a user or team can consume in a cluster.

And if your organization is using cluster-based isolation, you have the additional complexity of many more clusters to manage both costs and resources for. More clusters, more problems.

Open source tools that can help

There are a number of tools that can help with various aspects of multi-tenancy pain, but here are a few we recommend.

vcluster

vcluster is a tool we open sourced at Loft Labs to allow anyone to create virtual Kubernetes clusters with ease. A virtual cluster runs inside of a namespace on a shared host cluster but appears to the users as if it’s a full-blown, dedicated cluster. This is achieved by running a Kubernetes API server and some other tools inside the namespace on the host cluster. Users connect to the API server of the virtual cluster to deploy workloads and run kubectl commands, but the pods they create run on the underlying host cluster.

By default, users are admins in the virtual cluster, allowing them to manage any global objects like CRDs that their applications may depend on. This gives users the ability to do their work without bothering platform teams with more access while avoiding cluster sprawl at the same time. Creating virtual clusters with vcluster is also fast, which allows your users to experiment rapidly and discard unused clusters. This makes vcluster the perfect tool to spin up ephemeral environments in Kubernetes.

While vcluster is currently the most popular solution for virtual Kubernetes clusters, the SIG Multi-tenancy group has created an alternative called Cluster API Provider Nested. We expect to see much more innovation around virtual clusters in the next few years.

Kubecost

Kubecost is a tool for controlling your Kubernetes spend. The Kubecost cost models and a plugin for kubectl are open source, allowing teams to track their Kubernetes spend by service, namespace, labels, and more. The open source Kubecost also integrates with billing APIs for AWS, GCP, and Azure. There’s even an open source tool called Cluster Turndown for scaling clusters up and down based on schedules or other criteria.

kiosk

kiosk is a multi-tenancy extension for Kubernetes that we also created at Loft Labs, and it fills in some of the multi-tenancy gaps we’ve discussed. kiosk is focused on making it easier to provide self-service access to clusters for developers, and we know that self-service provisioning can reduce developers’ cycle times and increase their happiness. kiosk lets platform teams make templates for new namespaces that are created and even set resource quotas for users across multiple namespaces.

Another popular open source tool focused on providing self-service access to clusters is Capsule. Capsule creates a new primitive called a Tenant that allows teams to manage things like RBAC, network policies, and resource quotas at that higher level of abstraction.

Conclusion

While managing multi-tenant Kubernetes clusters could leave some scars in the past, the pain involved is driving more and more innovation. The ecosystem of tools around multi-tenancy has grown a lot recently, and we expect to see more and more tools and commercial products focused on reducing this pain.

Author: Rich Burroughs, Senior Developer Advocate, Loft Labs and Lukas Gentele, CEO, Loft Labs

Bio: Rich Burroughs is a Senior Developer Advocate at Loft Labs where he’s focused on improving the happiness of teams using Kubernetes. He’s the creator and host of the Kube Cuddle podcast where he interviews members of the Kubernetes community. He was one of the founding organizers of DevOpsDays Portland, and he’s helped organize other community events. Rich also has a strong interest in how working in tech impacts mental health. He has ADHD and has documented his journey on Twitter since being diagnosed.

Lukas Gentele is the CEO of Loft Labs, Inc., a startup that builds open-source developer tooling for Kubernetes and helps companies with their transition from traditional to cloud-native software development. Before moving to San Francisco to start Loft Labs, Lukas founded a Kubernetes-focused consulting company in his home country Germany. He has previously spoken at conferences such as KubeCon, ContainerConf and Continuous Lifecycle, writes articles for journals such as heise and Better Programming, and likes to share his experiences at meetups.

You may also like

Why AI Compounds Cloud Cost Problems and How Java Runtime Tuning Fixes It | Peter Maloney, Azul | TFiR

By Monika Chauhan3 days ago

AI Infrastructure

How to Run AWS Locally and Cut Cloud Dev Costs | Waldemar Hummer, LocalStack | TFiR

By Monika Chauhan6 days ago

Cloud Native

How Klutch Installs Into Any Kubernetes Cluster | Julian Fischer, anynines | TFiR

By Monika Chauhan7 days ago

Cloud Native

Why Platform Engineering Teams Over-Abstract and How Modular Design Fixes It | Corey McGalliard, Akamai Cloud | TFiR

By Monika ChauhanJuly 1, 2026

Cloud Native

Why HA Failover Fails: Overlooked Application Dependencies and Untested Runbooks | Matthew Pollard, SIOS Technology | TFiR

By Monika ChauhanJune 26, 2026

Cloud Native

Why AI-Generated Code Needs a Cloud Sandbox to Be Trustworthy | Waldemar Hummer, LocalStack | TFiR

By Monika ChauhanJune 26, 2026

Cloud Native