AI Infrastructure

Why AI API Costs Force a Self-Hosted Model Strategy | Rob Hirschfeld, RackN | TFiR

0

Token costs from third-party AI APIs scale faster than most engineering budgets anticipate. Once AI-driven productivity becomes embedded in daily workflows, the per-employee cost of external API calls stops being an experiment line item and becomes a permanent operational expense. The question of whether to host models internally is no longer architectural preference. It is a financial inevitability.

In this interview on TFiR, Rob Hirschfeld, CEO at RackN, walks through the cost math, data security rationale, and infrastructure mindset every platform team needs before the CFO resets the conversation.

Guest: Rob Hirschfeld, CEO at RackN
Show: TFiR

Here is what every platform engineer and infrastructure architect needs to know.

Technical Deep Dive

Q: When should a company stop using AI APIs and run its own model instead?

Rob Hirschfeld, CEO at RackN, argues there are very few legitimate reasons to stay exclusively with external AI APIs long-term. The primary exceptions are organizations that lack the expertise to manage model hosting, do not want to build that expertise, or require access to the absolute latest bleeding-edge models at all times. Hirschfeld notes that in practice, those edge cases rarely hold up under scrutiny, because always running the highest-end model for every task is both expensive and unnecessary for most workloads.

“You don’t want to always be using the highest end model for your tasks. It’s an expensive waste of tokens. It doesn’t produce better results.” — Rob Hirschfeld, CEO, RackN

Q: What is the actual cost math between AI API token spend and self-hosted GPU infrastructure?

Hirschfeld frames the comparison directly: if an organization is spending $10,000 per employee on API token costs, a one-time $5,000 GPU investment for that employee cuts the recurring cost in half. That calculation holds regardless of model size, because the majority of production AI workloads are background, linear, or repetitive tasks that do not require the largest external models. The financial case for self-hosted infrastructure becomes straightforward once token spend is annualized against hardware cost.

“Can we really be spending $10,000 per employee on token costs? Why don’t I just buy a $5,000 GPU for that employee and cut their costs in half? That’s a very easy equation to make.” — Rob Hirschfeld, CEO, RackN

Q: Why is sending data across your firewall to an external AI API an infrastructure risk?

Hirschfeld identifies data egress through firewalls as a foundational reason to plan for hybrid AI infrastructure now. Routing sensitive or internal data to an external API endpoint creates persistent exposure that compounds as AI usage scales across teams and systems. The recommendation is to build the hybrid model hosting capability before it becomes urgent, not in response to a security or compliance incident.

“You don’t want to always be sending your data across your firewalls or off of your systems. It makes a lot more sense to be prepared today to have that hybrid environment.” — Rob Hirschfeld, CEO, RackN

Q: How should organizations think about background and linear AI workloads in their infrastructure planning?

Hirschfeld distinguishes between high-visibility interactive AI tasks and the growing volume of background or automated AI workloads that run continuously as part of production systems. These background workloads do not require frontier models and are well-suited for self-hosted inference. As organizations mature their AI usage, this category of workload becomes a permanent part of what Hirschfeld calls the digital workforce, and infrastructure must be sized and positioned to absorb it cost-effectively.

“You’re going to absorb the workloads that are much more background or linear or just part of your background system, because those are also going to be part of your new digital workforce.” — Rob Hirschfeld, CEO, RackN

Q: Where are most organizations in the AI adoption curve today and why does that matter for infrastructure decisions?

Hirschfeld characterizes enterprise AI adoption as still in an exponential learning phase, where teams are discovering use cases faster than they are optimizing for cost or architecture. This means current token spend is likely understated relative to where it will be in 12 to 24 months as AI productivity becomes fully embedded. Infrastructure teams that wait for usage to plateau before planning self-hosted capacity will find themselves behind on both cost and security posture.

“AI workloads and AI productivity are not even fully baked in yet. We’re still in an exponential curve where people are learning how to use these tools.” — Rob Hirschfeld, CEO, RackN

Resources & Documentation

  • RackN, infrastructure automation platform for managing AI and hybrid workload environments at scale

***

👇 Click to Read Full Raw Transcript

Swapnil Bhartiya: Now, let’s just flip the script. When should a company absolutely not try to run their own model? What are the red flags that say stick with the API?

Rob Hirschfeld: I’m not sure that I have any reasons that you wouldn’t do it, you know, because those are, these are. These are places where unless you just don’t have the expertise or you don’t want to maintain the expertise, or you’re trying to use the most bleeding edge models all of the time. Right. Those would be the few use cases I would see, but in my experience, those aren’t real use cases. So you don’t want to always be using the highest end model for your tasks. It’s an expensive waste of tokens. It doesn’t produce better results. You don’t want to always be sending your data across your firewalls or off of your systems. Right. It makes a lot more sense to be prepared today to have that hybrid environment and that you will do some type of hosting, because I guarantee you that if you’re not asking that question right now, your CFO is going to be asking that question when they get the bill after a couple of months, they’re going to all be excited and cheering that your engineers and teams are getting productivity boosts. That’s fantastic. It’s about to be baked in. Then you’re going to be coming back to say, this is great, but can we really, really be spending $10,000 per employee on token costs? Why don’t I just buy a $5,000 GPU for that employee and then I don’t cut their costs in half? That’s a very easy equation to make. And you have to get into the mindset of AI workloads and AI productivity are not even fully baked in yet. We’re still in an exponential curve where people are learning how to use these tools. And you have to be prepared that you have to, you know, you know, provide access to the greatest latest models. Definitely need to do that and be preparing that. You’re going to absorb the workloads that are much more background or linear or just part of your background system, because those are also going to be part of your new digital workforce.

How CISOs Turn Threat Intelligence Into Security Decisions | Steve Winterfeld, Akamai | TFiR

Previous article