Enterprise AI has a $100 billion problem: every new chip, every model breakthrough, every inference optimization creates potential vendor lock-in. Without a neutral abstraction layer, organizations risk rebuilding entire AI stacks every six months as hardware and software architectures evolve at unprecedented speed.
PyTorch has become that layer—the de facto standard preventing catastrophic fragmentation across training, inference, and agentic AI workloads.
The Guest: Mark Collier, Executive Director at The PyTorch Foundation
Key Takeaways
- PyTorch has become the critical abstraction layer: No AI chip ships without PyTorch support—Nvidia, AMD, Intel, ARM, and startups all test against PyTorch before launch
- Inference overtook training in market size: By end of 2025, the inference market became larger than training as AI moved from R&D to production revenue-generating applications
- Multi-project foundation strategy: PyTorch Foundation expanded to vLLM (production inference), Helion (GPU kernel accessibility), SafeTensors (model security), DeepSpeed, and TorchServe
- Hardware-software co-evolution: Continuous integration systems test every PyTorch commit against latest accelerator hardware from foundation members
- Three pillars of AI infrastructure: Training (PyTorch creates the “brain”), Inference (vLLM serves models in production), Agents (utilize models via protocols like MCP)
***
[expander_maker]
In this exclusive interview with Swapnil Bhartiya at TFiR, Mark Collier, Executive Director at The PyTorch Foundation, discusses the critical role of open source AI infrastructure in preventing vendor lock-in, the shift from training-focused to inference-dominated enterprise AI, and how PyTorch Foundation’s multi-project strategy addresses the full AI production lifecycle.
PyTorch as the Hardware Abstraction Layer: Why No Chip Ships Without It
The AI hardware market is experiencing explosive innovation—new accelerator architectures from Nvidia, AMD, Intel, IBM, ARM-based startups, and specialized inference chips launch monthly. Without a neutral abstraction layer, enterprises would face impossible integration costs and perpetual vendor lock-in.
Q: How does PyTorch prevent AI infrastructure fragmentation across competing hardware vendors?
Mark Collier: “One of the things that is so interesting to me is when Jensen gets on stage and launches a new chip or a new architecture, what I hear is PyTorch, PyTorch, PyTorch, and vLLM. For any new chip, the path to market—that very expensive, very hard-to-get GPU—means it actually can’t do any real work without PyTorch. PyTorch is the most essential open source project in the world when it comes to AI because nobody will even think about shipping a new chip without having it fully tested on PyTorch.”
The PyTorch Foundation coordinates massive-scale continuous integration testing, validating every code commit against the latest hardware from Nvidia, AMD, and other foundation members. This ensures hardware and software evolve in lockstep despite both changing architecturally at unprecedented speed.
Q: What makes this hardware-software co-evolution different from previous computing eras like cloud infrastructure?
Mark Collier: “This is the first time I’ve ever seen hardware and software both evolving and changing architecturally at the same time, and certainly at a rate we’ve never seen before. How are we possibly going to handle that without working together, collaborating, and leveraging open source?”
Collier draws from his experience leading the OpenStack Foundation during the private cloud era, noting that while OpenStack dealt with hardware diversity, the pace of change was far slower and architectures remained relatively stable.
The Inference Revolution: Why Production AI Overtook Training in Economic Value
While training large language models generates headlines with billion-dollar GPU clusters, inference—running AI models in production to generate business value—has become the larger and more critical market.
Q: Why has inference become more economically important than training for enterprise AI?
Mark Collier: “We’ve had this shift where, a few years ago, the training market was much bigger, and now, by the end of this year, the inference market will be bigger. As AI goes from a promising concept that might be useful someday to something that’s actually delivering real value today, you see the importance of inference.”
The shift reflects AI’s maturation from research to production. Training runs can be restarted if they fail; inference downtime means customer-facing applications go offline and revenue stops.
Q: How does vLLM address the unique requirements of production AI inference?
Mark Collier: “vLLM is by far the most popular tool for running these AI models in production, which is the inference stage. With training, if a training run fails, you usually just restart it, whereas when you’re talking about inference, you’re talking about production. If it’s J.P. Morgan or some serious enterprise use case, and your inference goes down, your customers are offline, and you’re losing revenue. The inference challenge is even more difficult because there’s a lot more at stake.”
vLLM ships new versions on “day zero” when new model architectures launch—often within 24 hours—enabling enterprises to adopt model innovations (which can reduce costs or improve performance dramatically) without rewriting inference infrastructure.
Multi-Project Foundation Strategy: Beyond PyTorch Core
The PyTorch Foundation has expanded from a single-project foundation to a multi-project ecosystem addressing training, inference, GPU accessibility, and security across the AI lifecycle.
Q: Why did PyTorch Foundation expand to include projects beyond the core PyTorch framework?
Mark Collier: “It takes more to solve this set of problems than just PyTorch. One of the things we’re excited about is that we have two new projects that are now part of the PyTorch Foundation. Helion is a layer of abstraction above Triton—it’s a way of making it more accessible to program and to get value out of expensive GPUs without having to be quite as deep an expert.”
Helion, contributed by Meta (PyTorch’s original creator), provides a domain-specific language that compiles down to Triton, which compiles to GPU kernels. This makes low-level GPU optimization accessible to engineers without deep hardware expertise.
Q: How does SafeTensors address AI model security and provenance concerns?
Mark Collier: “SafeTensors is really about security, coming from Hugging Face. When you download model weights, historically there was a format called Pickle that could allow arbitrary code to execute—the worst thing you could say to a security researcher. SafeTensors can validate that the weights you’re downloading and that the model itself are what they claim to be and have not been tampered with.”
By providing a neutral home for SafeTensors at the foundation, the project becomes a cross-vendor standard rather than a single-company format, critical for enterprise adoption where supply chain security is paramount.
The Three Pillars of AI Infrastructure: Training, Inference, and Agents
As agentic AI moves from research concept to production deployment, the relationship between training, inference, and agent frameworks has become clearer—and more interdependent.
Q: How do AI agents relate to the PyTorch training and vLLM inference layers?
Mark Collier: “The AI world has three pillars: training, inference, and agents. They’re not separate domains, but pillars that work together. When you have a coding agent or any long-running agent, it is calling tools and connecting to data, but what it’s connecting to is models. Those models were trained by PyTorch. When the agent needs a brain, the brain is created by PyTorch.”
Collier emphasizes that agents don’t replace models or inference infrastructure—they build on top. The agent orchestrates workflows, but every intelligent decision flows through inference calls to models trained on PyTorch.
Q: How are model architectures evolving specifically for agentic workloads?
Mark Collier: “Models and inference architectures do need to change and evolve for the world of agents. Particularly when we think about production infrastructure at scale, it’s a different world when the user of your application is not a person, it’s an agent. There are all kinds of corner cases being discovered as we go along.”
This includes new considerations around “agent experience” (AX)—designing APIs and systems knowing the consumer is an autonomous agent rather than a human, which affects security models, rate limiting, error handling, and cost optimization.
From Research to Revenue: AI’s Market Fit Moment
After years of hype cycles, 2025 marked a turning point where AI coding agents proved genuine economic value at enterprise scale.
Q: What changed in 2025 that proved AI’s economic viability beyond hype?
Mark Collier: “Historically, in the tech industry, we’ve had this bad habit of building a technology and trying to find a market for it. There was valid criticism of AI over the past couple of years—asking whether we were doing it again. This year, with Claude Code around Christmas time and Opus 4.5, the capability went to another level. That’s a real turning point. This massive increase in investment is in search of a return, but coding agents are the first true product-market fit, in terms of economically valuable output that people want to accelerate aggressively.”
The shift from “solution looking for a problem” to genuine ROI has accelerated enterprise adoption, changing the composition of the PyTorch community from academic researchers to production engineers at financial services firms, product companies, and enterprise software vendors.
Watch the full TFiR interview with Mark Collier here.
[/expander_maker]





