When Matthew Shaxted talks about AI infrastructure, he speaks from the vantage point of a decade-long journey that began in the rarefied world of national lab supercomputing and has evolved into today’s fractured, fast-moving AI landscape. As Co-Founder and CEO of Parallel Works, Shaxted has watched high-performance computing (HPC) morph into cloud-based AI systems — and now, into a world where private AI deployments, GPU-as-a-service, and hybrid orchestration are becoming the norm.
Parallel Works’ latest offering is an AI control plane built to unify AI infrastructure orchestration across both modern and legacy environments. The platform integrates native Kubernetes support and GPU-as-a-service connections, with a focus on solving a challenge that’s only getting harder: running AI workloads across diverse, often incompatible infrastructure without spiraling costs, complexity, or compliance risk.
From Supercomputers to Private AI
Parallel Works spun out of Argonne National Laboratory, where Shaxted and Co-Founder Mike Wilde were building tools to run large-scale simulations on the kind of “big iron” supercomputers that can fill entire buildings. Back then, their goal was to democratize HPC for industry — putting a user-friendly interface on top of batch schedulers and tightly coupled compute, storage, and networking systems.
But the last decade has upended how large-scale compute gets delivered. “Six years ago, you could start getting the same performance in the cloud as you’d see on a Cray with InfiniBand,” Shaxted recalls. That marked the turning point when HPC workloads began moving to hyperscalers. Soon after, enterprises started adopting hybrid models—combining on-prem HPC with cloud-based AI training services like SageMaker, Vertex AI, and Azure ML Studio.
Today, many organizations are hitting another inflection point. “Once you reach a certain level of maturity, you start asking whether it’s time to purchase your own accelerator system,” Shaxted says. “That’s the transition into private AI.” This shift requires not only new capital investment but also the expertise to run GPU-heavy systems at high utilization.
Kubernetes Without the Headaches
The Parallel Works control plane is designed for two core stakeholders: the infrastructure teams managing compute environments and the end users who need to access them. For infrastructure teams, the challenge is rarely about running a single system — most environments today mix multiple clouds, on-prem clusters, virtualized workloads, and batch schedulers.
Shaxted says Kubernetes is part of the answer, but adopting it at scale brings its own complexity, especially when teams have to manage multiple clusters across cloud and on-prem footprints. Parallel Works addresses this by centralizing key functions — unified namespaces, group management, usage tracking, and chargeback — while integrating Kubernetes into the broader infrastructure mix. “Rarely does an organization start with 100% greenfield Kubernetes,” he notes. “It’s usually part of a migration path.”
For end users, the goal is to have a single, uniform interface and API to access resources—whether they’re running in Google Cloud, a niche GPU provider, or a legacy HPC batch scheduler. “We want people to be productive on day one without having to learn every vendor’s console,” Shaxted explains.
GPU-as-a-Service, Simplified
AI hardware is more accessible than ever — developers can run models on a desktop GPU or rent high-end accelerators by the hour. But operationalizing that capability across a team is where the real friction begins. Parallel Works streamlines integration with GPU-as-a-service and “neo-cloud” providers, letting teams connect new resources with just a few clicks.
Shaxted shares a recent example: spinning up a 100-A100 node cluster from an aggregator, plugging it into the control plane, and running Jupyter Notebooks and LLM deployments in under five minutes. The company is building a partner ecosystem to validate and certify providers, ensuring that any connected resource works seamlessly with Parallel Works’ orchestration capabilities.
Driving Utilization and Cost Efficiency
For many customers, the ROI case comes down to utilization. Without usage tracking and chargeback mechanisms, expensive GPU nodes can sit idle or be over-provisioned for lightweight tasks. Shaxted argues that visibility changes user behavior: “When end users can see what they’re spending, they start right-sizing their workloads.”
By providing fine-grained usage metrics across all infrastructure types — not just Kubernetes — Parallel Works helps organizations maximize return on their AI hardware investments from day one.
Sovereign AI and Policy-Driven Placement
Geopolitical shifts, trade restrictions, and regional data privacy laws are pushing AI teams toward sovereign AI strategies. That often means running workloads in specific jurisdictions — and sometimes on multiple regional clouds and local data centers. Shaxted says this makes hybrid and multi-cloud orchestration even more critical.
Parallel Works is building policy controls that let infrastructure teams dictate where workloads can run, based on user groups and compliance requirements. For end users, the underlying location becomes invisible; they simply submit tasks, and the system ensures they execute in allowed regions.
The Next 18 Months
Shaxted sees two major transitions underway: the ongoing shift from managed AI services to private AI infrastructure, and the expansion of AI workloads to the edge. In both cases, he believes orchestration will be the key differentiator. “We want to make it a uniform experience, whether you’re running in the cloud, in your data center, or across every store you own,” he says.
In other words, the future of AI infrastructure management may be less about where the compute lives — and more about the policies, visibility, and control planes that make it all work together.





