Enterprise AI adoption is accelerating, but infrastructure complexity remains a major bottleneck. Mirantis is addressing this challenge head-on with their AI Factory Reference Architecture—a comprehensive blueprint designed to get AI workloads operational in days rather than months.
The Mainframe Revival: Why AI Infrastructure is Different
“We’ve moved back to the world of the mainframe, where applications are tightly coupled to the whole infrastructure stack,” explains Shaun O’Meara, CTO at Mirantis. This fundamental shift means traditional enterprise data center approaches don’t apply to AI workloads.
📹 Going on record for 2026? We're recording the TFiR Prediction Series through mid-February. If you have a bold take on where AI Infrastructure, Cloud Native, or Enterprise IT is heading—we want to hear it. [Reserve your slot
Randy Bias, VP of Open Source Strategy, draws parallels to supercomputing: “These systems are very different from typical enterprise data centers but have a lot in common with traditional supercomputing or high-performance computing systems.” The goal is aggregating multiple GPUs—816, 24, or 32 units—to function as a single system with contiguous memory.
Beyond GPU Hardware: The Complete Stack Challenge
While vendors like Nvidia provide GPU reference architectures, they leave critical gaps. “They leave the delivery of multi-tenancy, multi-cluster setups to the vendors,” notes O’Meara. The AI Factory Reference Architecture addresses these gaps by encompassing the entire ecosystem: DNS, identity management, and all supporting services needed for cloud providers to effectively sell GPU access.
The complexity extends beyond initial deployment. “Hardware is delivered racked and stacked, and then it can take months to sometimes years to get this hardware actually viable and working,” O’Meara explains. The reference architecture provides an opinionated, deployable solution to accelerate this timeline.
Kubernetes as the Super Control Plane
Central to the architecture is using Kubernetes not just as a scheduler, but as a “super control plane.” This approach maintains flexibility while providing standardization. “We’re using Kubernetes as the Super control plane here,” O’Meara emphasizes, “but we’re not just doing Kubernetes. We still have OpenStack as part of the offering.”
The platform’s composability allows organizations to swap components as needed while maintaining governance and control. This flexibility is crucial as AI frameworks evolve rapidly, requiring infrastructure that can adapt quickly to support new innovations.
Making Infrastructure Invisible for Data Scientists
Following the cloud computing model that made infrastructure disappear for developers, the AI factory aims to do the same for data scientists. “Data scientists care even less about the infrastructure,” Bias observes. “They need to get from data to trained models and then deploy their applications, and they just want the infrastructure to disappear.”
The curated catalog includes CI/CD, observability, and security tools, enabling teams to compose their own stacks rather than being forced into rigid configurations. This approach reduces friction while providing standardization and guardrails.
Enterprise-Grade Security and Compliance
Drawing from their extensive experience with financial institutions and government organizations, Mirantis ensures the architecture meets stringent compliance requirements. “We’re really comfortable and familiar with their regulatory and governmental compliance requirements,” Bias explains.
The architecture supports deployment across on-premises, hybrid, and edge environments while maintaining consistency and security. The declarative approach allows organizations to define environment requirements and validate delivery matches expectations.
Looking Forward: Agentic AI and Beyond
The reference architecture is designed to evolve with the rapidly changing AI landscape. Future developments include agentic AI capabilities with enhanced security and governance features. “We’re starting to look at what it means for an enterprise to adopt agentic systems, agentic workflows,” Bias reveals.
The 80-page living document continues to grow as Mirantis incorporates new learnings and technologies. As O’Meara puts it, “Open source is not just about code; it’s about the sharing of knowledge.”





