Ready for bare metal Kubernetes? Let's ask Rob Hirschfeld

At major open-source and cloud-native events like KubeCon + CloudNativeCon, the best person to connect with and get a pulse of the industry is Rob Hirschfeld, Co-Founder and CEO of RackN. Known for his deep insights into Kubernetes and infrastructure, Hirschfeld shared his observations about the event’s energy, ecosystem, and emerging trends.

Hirschfeld noticed the excitement among smaller vendors compared to the steadier pace of more established companies. Reflecting on the event’s diverse themes, he highlighted how Kubernetes has grown to encompass areas like developer experience, platform engineering, and observability.

One standout trend he emphasized was the growing interest in bare metal, noting that while Kubernetes was originally designed with virtual machines in mind, enterprises increasingly explore bare metal for performance, efficiency, and AI/GPU workloads.

While there were no dedicated talks on bare metal Kubernetes, he did notice a lot of mention of bare metal at the show. “There’s a huge upsurge in interest in bare metal,” said Hirschfeld, “I would suspect it’s big enough that it might even pull us into the show.”

RackN specializes in bare metal and Infrastructure as Code (IaC), so this trend towards bare metal is an encouraging signal for companies like RackN. The challenges of running bare metal Kubernetes, including the need for real-life cycle control and day-two operations, are home turf for RackN as their solutions like Digital Rebar automate bare metal infrastructure, helping enterprises to manage their own servers.

Hirschfeld also touched on broader challenges in the cloud-native ecosystem, such as managing AI (artificial intelligence) clusters and standardizing processes. He noted the impact of Broadcom’s acquisition of VMware and the difficulty of switching hypervisors, suggesting that companies should invest more in Kubernetes to maintain flexibility.

Guest: Rob Hirschfeld (LinkedIn)
Company: RackN
Show: Let’s Talk

Questions discussed

What kind of energy and discussions are you seeing at the conference?
How do you see RackN and bare metal in relation to Kubernetes?
What are the hot topics at the conference, particularly regarding AI and platform engineering?
What is RackN doing in terms of Kubernetes and generative AI (GenAI)?
What impact has Broadcom’s acquisition of VMware had on discussions at the conference?

[read more]

Unedited Transcript (Note: the text is AI generated, it has not been edited or reviewed. It may contain errors, including incorrect names. It’s provided here under Creative Commons license (CC by 4.0) to be used by bloggers, journalists and analysts for creating their own content.)

Swapnil Bhartiya: Hi. This is Swapnil Bhartiya. We are here at KubeCon + CloudNativeCon in Salt Lake City, Utah. And today we have with us, once again, Rob Hirschfeld, co-founder and CEO of RackN, and Rob is good to have you back on the show in person again.

Rob Hirschfeld: It’s exciting. There’s a lot of energy here, so it’s fun to get involved and see what’s going on.

Swapnil Bhartiya: It’s the first day, but still half a day. People just finished lunch. What kind of energy you’re seeing, what kind of discussion you are seeing, what kind of concern you’re seeing.

Rob Hirschfeld: So one of the things that’s been fascinating to me is I’ve had a chance to walk the whole floor, which takes a bit of time. I’ve worn out one pair of walking shoes already. And the vibe in the smaller booths, right the gold the, you know, they split, sort of split the floor between the big vendors and the rest of the smaller vendors. But the vibe in the smaller vendors is actually really exciting. There’s a lot of things going on. It’s actually a little more sedate in the more spread out diamond platinum level over here, in part because I think people know what those vendors do. So we’re 10 years in. The vendors that have been at the top of this pyramid are really well locked in. They’re well established brands. People know what they’re doing, and so there’s a lot of interest in all the other pieces that are coming around it. The ecosystem here is huge, and there is a challenge as you walk around, of having to figure out what people do and what you know what’s going on, there’s a ton of, you know, sort of niche components that are getting added into Kubernetes. The big vendors are trying to consolidate some of that play under but there’s a lot of motion in how people are using Kubernetes. And ultimately, that’s the theme of the show is, how do you use Kubernetes? Is that developer experience, the platform engineering pieces, the pipelining, monitoring, Observability, all of those are really the, what I would call above the line Kubernetes concerns, and that’s sort of echoed in how I see the conference being built right now. And

Swapnil Bhartiya: Where do you see you or RackN and thread, because you folks focus a lot on bare metal. How do you look at Kubernetes? How real is bare metal and Kubernetes coexistence?

Rob Hirschfeld: So the interest in bare metal is on a huge upsurge. I would suspect it’s big enough that it might even pull us into the show. And normally, people don’t think of the infrastructure side of the show at all, but what we’re seeing across the board in talks, in and sort of as a parenthetical in a talk. So there aren’t any real talks about bare metal, but there’s a lot of talks that mention bare metal, and from our customer’s perspective, there’s incredible drive around how you do bare metal, and that’s for the people listening. What RackN end does is we write software that automates bare metal infrastructure. It’s software. So we’re helping enterprises run their own infrastructure. They do it themselves. They use the software. They own the servers. It’s that type of thing. But when you’re looking at Kubernetes on metal, that is incredibly hard. Kubernetes was really designed to be run in virtual machines, on clouds, and so when companies are increasingly turning around and saying, I want to run this without the virtualization layer, in part because of what Broadcom has done with licensing, in part, just for efficiencies or getting AI GPU work done, which is also a big driver for bare metal, they’re starting to realize that the need for real life cycle control, day two operations, ongoing integrations to how the clusters are managed. All of those are real concerns. They’re not top of mind at the moment here, because people are so used to doing it in the clouds, where they have APIs for it, but there’s a lot of demand for people to be able to do that work themselves.

Swapnil Bhartiya: Have you had any discussions around bare metal here so far, as you said, we might get pulled in. Yeah. So what kind of patterns, what kind of interests you’re seeing where, you know, one day, we’ll see the RackN and boot as well.

Rob Hirschfeld: So what we see is a lot of interest in people who have a Kubernetes distro. We would never do a Kubernetes distro. That’s an enterprise level decision. It’s a developer level decision. And this is one of the challenges that we see a lot in how people build enterprise. IT teams are the people doing operations and infrastructure are often very separate from the people who are using and consuming Kubernetes who are building their applications. And so in order for the bare metal pieces to work, you really have to bring along those operations teams in a way that they haven’t been brought along before. So just throwing a whole bunch of Kubernetes terminology or making everything work inside of Kubernetes, which has been sort of the default, doesn’t translate well into teams that aren’t used to doing yaml and get ops and CI/CD pipelines. You really have to be able to look at the operational side very differently. And so that’s part of what would bring us in as a booth would be helping do that translation layer, because this is really a developer audience,

Swapnil Bhartiya: like one of the hottest topics in recent time has been, you know, AI/Gen AI, even at the KubeCon last two kubecons, lot of keynotes. I’m. Also hearing a lot of platform engineering this time as well. What are the topics that you are not the topics who have been around for a while, but where you’re seeing just the talking, but the discussion that you’re seeing is like the excitement is. What are those topics?

Rob Hirschfeld: So AI is still the dominant topic from a discussion perspective, and what’s interesting for that, and we should talk about some of the things that they’re adding into Kubernetes to deal with AI. So part of what we saw is a lot of enthusiasm for AI. What you have to understand is there’s a lot of enthusiasm, but Kubernetes itself doesn’t map fully into AI needs, right? The keynotes were specifically talking about the challenges of over subscription or over scheduling or conflict of work getting the right resources. How do you map GPUs into Kubernetes workloads? These are things that actually aren’t completely answered yet. So there’s a lot of conversations about, how do I manage jobs and queues and workloads and things like that. You mentioned platform engineering. There is a lot of talk about platform engineering, but it’s still nascent. They’re still defining platform engineering, in a lot of cases, how to do platform engineering, I don’t think is as much a topic here, and it’s actually a challenge. A lot of times this show goes to what, how, what project it is, what we’re doing, the what thing we’re doing, they don’t spend a lot of time talking about the how, and I do think that’s a real miss in some of how Kubernetes is structured from an education perspective, and what people need. So giving people tools for platform engineering is great. Platform engineering requires discipline and controls and investment in teams and expertise and collaboration, those are things that the companies have to work through. Or just throwing in a developer portal is not going to actually help developers at all. They’re going to find it very limited. Same thing is true with a lot of these AI workloads. What we hear over and over again in the AI sessions is just how difficult it is to build that AI cluster to get it up and running, to keep it running, handle dropouts and errors. So it’s not just a matter of, oh, I have Kubernetes now my AI cluster problems are solved. There’s a lot of operational concerns that people have to be aware of, and we hear over and over again, it taking months to bring AI clusters up because of all that complexity, and then keeping them running is absolutely important, and the investment in these clusters is huge, even if you’re not buying the servers, if you’re just renting the servers, if you spend, you know, an extra week or two weeks or a month getting your clusters going on that that infrastructure, or if you bought the server, it takes you, you know, quarters to get an ROI even begin getting an ROI from that server. It’s simply unacceptable. So we do see people wanting to talk about that acceleration, getting a little deer in the headlights. Look of like, yeah, I’ve got the servers. I’ve got the infrastructure. It’s not running yet, and that’s a real challenge here that I think, you know, the community isn’t, isn’t talking about quite the same way. They’re trying to learn.

Swapnil Bhartiya: But people need help to do that work, and how, or if RackN and can provide that help?

Rob Hirschfeld: We do provide that help quite a bit. So we have behind the scenes, some of the largest AI clusters are driven by RackN and RackN and automation. And the reality is that an AI cluster, and the type of enterprise clusters that we deliver globally, that software, that automation, is completely portable. So when you’re looking at an AI cluster expanding and using the GPUs is definitely an added piece to it, but the proven roads the way, we’ve just been able to completely standardize and have standard processes behind all of the automation that goes into this is a key component. So that’s really critical. The other thing that we see is, at the end of the day, you don’t want to do AI one way and virtualization another way, and applications and Kubernetes a third way. So it’s really important to find standardized ways to make this stuff work, to have routine processes, to have partners who can just sort of lift you through that. The reality in the cloud nowadays is that a lot of people don’t have the bare metal expertise to even understand how they should be running these systems. And there’s a lot of knowledge there

Swapnil Bhartiya: Talk a bit about what you’re seeing in terms of Kubernetes and Gen AI. And of course, I would like to hear what is RackN and doing in this, because everything has to be in the context of RackN.

Rob Hirschfeld: And here is my favorite topic. But the idea here, of course, is we see Kubernetes very much as the foundation for Gen AI workloads. So we talk about Gen AI, there’s inferencing, and there’s also actual model building, and both of them are going to have Kubernetes as a core control layer. So you know, we really do see that regardless of what type of cluster you’re going to build, what use case. You have, Kubernetes is going to be a big part of that but, and there is a big but here, the real resource constraint here is becoming GPUs. And, you know, Kubernetes is still catching up on how you share, manage, find, control, and inventory those GPUs. How do you keep them patched? How do you keep them ready to run? All of those things are emerging within the Kubernetes community. There are things that, as people look at using Kubernetes as their AI control plane, which we expect people to do, or we already see that being the default decision. We still have to work through those problems. We still have to build patterns. It’s one of the things that we’re doing here, right? I have several people from our team here, including my CTO, Greg Althaus, and that’s what we’re listening for, very carefully, is we’re listening for what problems do you have keeping these clusters running? How do we help you tune a cluster so that you get more performance out of it, so that we reduce the learning curve? How do we reduce those speed bumps? And that’s the type of sharing and information that really does accelerate for people to go so as you look at using Kubernetes in those use cases, that really comes forward. Another place that we are seeing real acceleration in interest is on this virtualization on top of Kubernetes. And so we do see quite a bit of demand on the bare metal side of Wait a second, I want to run my virtualization platform on top of Kubernetes, instead of under Kubernetes. And that also takes a lot of emerging technology, right networking technology, storage technology, just how you build manage the virtual machines on top of Kubernetes. There’s some really good projects for that, and we’re watching those things progress definitely something that companies should be planning for as they go forward, just like planning to use Kubernetes as their AI engine. But then putting all those things together becomes a really delicate piece. And, you know, I can’t stress enough learning from other people’s knowledge here is really important. That’s what a conference like this is about, and it’s one of those things where we’re not seeing as much sharing yet, just because there isn’t as much experience in the field yet to actually do this work.

Swapnil Bhartiya: and see you said, you know, this is one of your favorite topics, you know, but there’s another of your favorite topics, you may know what I’m going to talk about. Did you feel any shock waves for the whole VMware and broad comma condition and the way lilac has changed? Or you’re like, No, there was no reception here. Or you do hear a lot of discussion where people are looking at alternatives and you are engaging with them, and you’re like, hey, this is what reckon does.

Rob Hirschfeld: It’s not an understatement to call the Broadcom shock wave seismic. And the interesting thing is, I don’t think they’ve all the way, so we are definitely having a lot of conversations. We’ve built, actually, a lot of really good material to help people evaluate alternatives. One of the nice things is RackN and doesn’t have a hypervisor, just like we don’t, we don’t make servers, we don’t make a hypervisor, we don’t do a container platform. So we partner with the vendors in those spaces. So we do a lot of VMware installs, but we install other platforms as well. And so customers come to us as a trusted advisor on which things they should consider. And the amazing fact of the matter here is that it’s very difficult to switch hypervisors. It is if you’re using VMware, it’s very hard to move to an alternative like Nutanix or we recommend prox mocks for open source hypervisors quite a bit. And the reason is because those are architecturally different, your operations teams have to learn new skills, have to work with different things. They actually buy different servers. They buy different equipment. And so what we found is it’s much harder for for companies to change direction, to embrace a different hypervisor, just because they want to avoid VMware there, and that’s why the Kubernetes, as the skip over virtualization, has been such an attractive thing. Kubernetes is much more agnostic about the underlying infrastructure, and if you invest more in Kubernetes, you have more flexibility about eliminating or changing that virtualization layer. So what we expect, and we really see a lot of movement in is companies trying to limit or lock their VMware footprint as is. So they’re not planning new purchases, new acquisitions, but the infrastructure they have, they’re gonna keep running that because it’s designed for VMware. You can’t just take the hardware you have, throw out VMware and bring in an alternate vendor. You can’t even easily migrate it to Kubernetes. It’s locked in on that footprint. So it’s really net new purchases. It’s really going forward decisions on how people are working, and so they have more time than they might think to make those decisions, but they do have to have very assertive decisions about going forward. New platforms will look like this overall, the decision is, if I just invest more in the direction I already have, which is Kubernetes, then I’m going to have a winning strategy from a VMware migration.

Swapnil Bhartiya: Rob, once again, it. Was great seeing you in person again, and thanks for great insight. And as usual, I look forward to chatting with you folks again. Thank you.

Rob Hirschfeld: Thank you. I appreciate the questions about the show. There’s so much going on here.

[/read]

Ready for bare metal Kubernetes? Let’s ask Rob Hirschfeld

Questions discussed

Spacelift platform now automates execution of Ansible playbooks

Strategies for developing effective LLM prompts | Aaron Vermeersch

Questions discussed

Spacelift platform now automates execution of Ansible playbooks

Strategies for developing effective LLM prompts | Aaron Vermeersch

You may also like

Why Team Silos Break High Availability in Complex Environments | Matthew Pollard, SIOS Technology | TFiR

One Control Plane for All Data Services Across Kubernetes and Cloud | Julian Fischer, anynines | TFiR

The CFO’s Guide to Java Runtime Efficiency | Peter Maloney, Azul | TFiR

The Hidden Risks of Untested HA Environments | Cassius Rhue, SIOS Technology | TFiR

Token Governance, AI Harnesses, and Bare Metal AI Infrastructure at Scale | Rob Hirschfeld, RackN | TFiR

The RBAC Reality Check for AI in Platform Engineering | Corey McGalliard, Akamai Cloud | TFiR