AI/MLDevelopersFeaturedLet's Talk

Bringing AWS S3 Compatible Object Storage To Your Own Cloud | AB Periasamy


MinIOS is a provider of open-source, high-performance, object storage software. The AWS S3 compatible object storage solution by MinIO can run anywhere customers want, enabling customers to build their own data infrastructure. In this special issue of Let’s Talk Kubernetes, for KubeCon NA 2021, we sat down with the CEO and Co-Founder of MinIO, AB Periasamy to talk about the company.

Here are some of the topics we covered.

  • Intro to the company
  • Understanding what exactly does MinIO offer
  • We discussed the scope and demand of object storage in the Kubernetes world considering the fact that Kubernetes is being used in so many places now, including at the edge.
  • What new challenges do these new use-cases bring to object storage and MinIO and how is MinIO tackling those challenges?
  • How different is object storage in cloud computing from let’s say traditional or legacy IT world?
  • Even the object storage space in the cloud native world is a crowded place so how does MinIO differentiate itself from the rest?
  • We also talked about KubeCon and what the company’s goal for the event was.

Guest: AB Periasamy (LinkedIn, Twitter)
Company: MinIO

Swapnil Bhartiya: Hi, this is Swapnil Bhartiya, and welcome to a special edition of TFiR: Let’s talk about Kubernetes for KubeCon and CloudNativeCon. And my next guest is, once again, Anand Babu Periasamy, Co-Founder and CEO of MinIO. Anand, it’s great to have you back on the show.

AB Periasamy: Great to be here. Thank you, Swapnil.

Swapnil Bhartiya: Though I remember the last we talked was almost a year ago and a lot of water flows through the cloud-native Kubernetes world in one year, so we should make our interactions more frequent, that’s one. Number two is, we did talk about MinIO, but I just wanted to refresh the memories of our viewers, what is the company all about?

AB Periasamy: The company is all about object storage. It’s exactly like Amazon S3. And MinIO is software and it’s open source and it runs everywhere. The idea behind MinIO is… AWS S3 is only inside AWS, but then the data is all over the globe. And wherever there is data, we need to be there. And we gave them an object storage that’s API to API compatible with Amazon S3 so they can build their own data infrastructure.

Swapnil Bhartiya: Are you offering a kind of managed object storage for… Just give us that… Okay, you want object storage available outside of AWS or through API, but what exactly are you offering?

AB Periasamy: Yeah. So what customers really want is similar to Kubernetes, right? If there is cloud, then why do I need Kubernetes? If you see why customers actually are going towards this approach is they want control over the technology and then the data. The way they do it is they want to keep the data and they want to keep the software technology that powers their infrastructure stack. And what they are doing is pretty much they are converged on Kubernetes as the infrastructure APA of choice, whether it’s a VMware Tanzu, OpenShift, SUSE Rancher, or any one of the public cloud, GKE, AKS, EKS. Now, Kubernetes becomes the infrastructure APA of choice. And then they roll out MinIO as software containers. And it’s no different from how you would run your Redis and MongoDB or Cassandra for your metadata. But for your blog data or the persistence layer, they would roll out MinIO as a software container on any of these public clouds or the private cloud all the way to the Edge.

So what we provide them is just the software and they would roll it out as part of their infrastructure stack. Certainly you can use the managed service of MinIO on the public cloud. So the operations part, like when to push the security updates and stuff like that, it becomes easier, so we get to guide the customers. But in essence, it’s software that customers are in control of data and technology.

Swapnil Bhartiya: You were talking all the way to the Edge, so I also want to understand a bit about the scope or demand of object storage in Kubernetes world, because the use of Kubernetes is just becoming the usage of Linux these days. Everybody’s trying to use it and there are some exciting use cases, so talk about the scope.

AB Periasamy: Kubernetes itself became important as the applications, when containerized. And once your software stack is containerized, distributed as software containers, like they used to do with RPMs and [inaudible 00:03:25] packages in the past. Now with the containers, you ought to manage the containers and that’s where orchestration platform is necessary and Kubernetes becomes the choice. So once that comes into picture, then MinIO basically is the data persistence layer. So when it comes to the Edge, for example, they are actually using pretty much the same software stack, it’s Kubernetes and MinIO containers rolled out to the Edge. And why they have adopted Kubernetes and containers even for the Edge? It is really about autonomous, self-managed remote locations. They don’t have enterprise IT there, right? And it’s easy to push software updates as containers. And the data is getting many of these situations in the Edge locations, whether it is like the automobile industry, industrial IoT, healthcare, the data is not produced in the cloud. Data is produced outside the cloud.

And in these Edge locations, whether these are… Like the data collected from these vehicles are industrial IoT, they process it there. But the software stack is, again, we find ourselves there… Kubernetes, MinIO, and these data processing stack like whether it’s Druid or Kubeflow, pretty much the same software stack, but they’re isolated and deployed as containers.

Swapnil Bhartiya: We live in a multi-cloud, hybrid-cloud world, applications can run anywhere. Data is actually what that matters. So a lot of folks, you have to move data around [inaudible 00:04:56]. It could even be between AWS and your OnPrem, or whatever other cloud you’re running. Plus, I also want to understand from the context of whether it makes sense in this context or not, when we talk about building data lakes, data warehouses, because what you can do with that object storage. So talk about that challenge as well.

AB Periasamy: Yeah. So the data lake actually is increasingly becoming the heart of every business. In fact, they call it sometimes data infrastructure or data warehouse. There are many names to it, but it’s basically a data repository that they capture all the data. And then why would you capture them? Because you want to process them and turn it into information. The technology to process this data, like machine learning, has become so common and cheap, easy to adopt. From small to large companies can take advantage of it. But then the closest thing industry had when it comes to large scale data infrastructure was Hadoop, and Hadoop is TFS. But that architecture, while it paved the way, it was very hard to operate and it was fundamentally incompatible with the cloud ecosystem. You look at the… Like running Hadoop on Kubernetes, it wouldn’t run. How about public cloud? Public cloud did not adopt Hadoop at all in the first place. There was early play, but look at from snowflake, to EMR, to every modern infrastructure out there, they’re fundamentally built on object storage.

And what customers started doing was, they saw the AWS model and they really liked that model. So the heart of that idea is disaggregate storage and compute, which was anti-thesis to the Hadoop model, which is co-locating data and compute. And Hadoop’s model is aged because it was created at the time when network was slow, drives were slow, and that was the closest thing you had to do large scale infrastructure. But in today’s times, AWS gave the blueprint to build modern data infrastructure. And then what customers are doing is ripping out the traditional data warehouse and Hadoop HDFS-based ecosystem, and then adopting MinIO as the data lake persistence layer. We are basically the modern day HDFS.

And then the layer about the MinIO stack is not looking like the traditional MapReduce stack. It’s actually quite different. It’s like Presto and Trino… Trino and Presto are pretty same… Like from Presto Spark to Druid, pretty much all the modern distributed databases, whether it’s Kafka, Elasticsearch, like Cassandra, everybody has gone to object test the native backend. Even the old ones like Vertica, Teradata, to Splunk, they all have adopted object storage as the foundation, whether it’s hybrid cloud, like multi-cloud, everybody has adopted this model. And MinIO, basically, becomes the one portable data lake that you can deploy on public cloud or private cloud.

And the problem here why MinIO versus Amazon S3 boils down to Amazon S3 is only available inside AWS. But then if you want to build a data lake on Azure, or Google cloud, or OnPrem, anywhere else, the problem is the ecosystem now demands S3 API, and MinIO is pretty much the choice, whether you’re looking at Kubeflow or any of these modern technologies, they’re fundamentally built on MinIO. They are more MinIO compatible than Amazon S3. And when they start building it, we are already there. They start with us pretty much from the early days of building the data infrastructure.

Swapnil Bhartiya: If we just looked at cloud native… Or, just let’s keep our focus on storage, how different it is from traditional world, which also means that the challenges are also different and opportunities are also different?

AB Periasamy: Yeah, it’s actually, when we started, MinIO’s ideas were quite radical, right? The traditional storage industry would dismiss us. And today you can see from EMC to NetApp, to every one of these players, they’re on paper, they want to look like MinIO. And they talk about how they want to win the developer mindshare, They want to be open source and container and Kubernetes-friendly. But what they missed was that, for us, it was quite clear from day one that it was about a land-grab, and land-grab is about getting the application ecosystem, and then the data footprint. It’s a very sticky problem, right? And by the time now they’re talking about Kubernetes, we got the market share.

And MinIO, if you see the stats, we’re talking about like 1.1 million Docker pools a day. And all these applications are now built on MinIO, and once users get comfortable with something that really works well for them, they don’t want to leave. And here, the difference between MinIO and competing solutions, they came from the appliance world. And for them containers means physical containers, right? They did not understand this market well, and now that they’re coming into this market, taking the same old technology stack and making it run on Kubernetes, it doesn’t really work. You have to really go back to the drawing board and build something native for a modern infrastructure. That’s something MinIO got an advantage starting late, that we were able to build it for the modern infrastructure without any baggage.

Swapnil Bhartiya: You’re absolutely right that you folks came at the right time, but cloud-native space is busy and crowded space and storage is no different. So can you talk about some of the advantages, no pun intended, the edge you have over others so that you stand out there in that clouded space?

AB Periasamy: The technical advantages are quite easy to describe. You see the performance… MinIO is the fastest object storage out there. We quite easily saturate a 100 gigabit NIC with NVMe drives on these machines. These are Commodity, SuperMicro, Dell and HPE servers that you can get from anywhere. And if you load MinIO on them, the bottleneck often turns out to be the 100 gigabit network. And if you put like 10 of them, 20 of them, 50 of them, racks of them, basically the network switches are not still as fast as we can move data. And these are commodity machines that can run circles around the highest-end appliances from the traditional vendors. So performance is clearly a differentiating factor. And it’s not just about AAML-type workloads, even if you’re talking about snapshots and copy data management, like just archivable workload, no one wants to actually take forever for their backup to complete. Performance is today a table stake and it’s important. Otherwise, Snowflake-type applications won’t be possible on AWS S3. And MinIO is the fastest one.

But to me personally, the most important distinction is it is simple the larger and larger infrastructure you want to build. If you have to hire more people to manage a complex infrastructure, you are scaling at all. All of these large deployments of MinIO, they’re often managed by one person not even full time managing MinIO. Its simplicity is actually the biggest differentiator. But then you can keep adding on top in terms of S3 compatibility into end feature. Our focus has always been, do one thing really, really well, and that happens to be objects storage in S3 API. That fine craftmanship attention to details, when users try it, they actually see it firsthand.

In fact, like our own CMO, when he joined us, his criteria… Like he explained to me another one of these… Sometime last week, he told me that at the time he heard about MinIO, he went and tried it for himself. Like as a CMO, he was able to run it at his home environment and he saw how easy it was. Most of the time, like customers to all these communities, it takes just minutes to deploy MinIO and they see how well it works, but it just works. It’s actually a huge differentiating factor. If it is so hard… Like traditional, the object storage systems, you need to have PhD level expertise to actually deploy and operate. That’s not the case for MinIO. You just learned like JavaScript two weeks before, and you started writing some web application, you deploy MinIO, you’ll find it easier than running Nginx or Node.js. And that led to a massive scale in terms of adoption. As well as, as the data grew, we grew with them. And the ease of use at scale is what is actually, I would say, the biggest differentiating factor.

Swapnil Bhartiya: Excellent. Now you folks are going to be at KubeCon, tell me what are you showing at the KubeCon?
AB Periasamy: The Kubernetes part has continued to play a dominant role for us. With the operator model maturing in Kubernetes, pretty much nowadays we find customers are using MinIO operator on all the clouds and all the new features that we launched, the operator has reached quite a bit of maturity. I would say it’s kind of boring if you’re a Kubernetes expert, you probably know everything we are going to talk about. This is really for the newer entrance into this space, that they are still learning about operator and what operator can do. And what the industry is catching up to is, how important Kubernetes operator model is for them. It basically eliminates the need for them to hire these DevOps with significant infrastructure automation expertise, that’s the part that operator allows us to productize it for the customers. So they don’t need to depend upon these DevOps who has considerable amount of knowledge on and how to operate MinIO.

With operator model maturing, it’s more about educational this time. Everything that we did, unlike others, we actually release it to the community and then they experience it. And then we get the feedback, and every week we keep making improvements. So there’s nothing we hold back, we do it in closed rooms, and then we come to make an announcement, they go start trying it out. We actually do it out in the open and the community evolved along with us. So from MinIO’s community point of view, it’s going to be quite boring because they are already using all these features in production. But the community outside of MinIO, I think they got a lot to catch up on the improvements that Kubernetes… the maturity part around the operator model, how they can build stateful services, data stores, not just MinIO, in terms of Cassandra, or Kafka, or Redis, all of them, how they can run them on Kubernetes and the benefits of it, we got a lot of education to do.

Swapnil Bhartiya: Anand, thank you so much for taking time out today and talk about not only MinIO, but also sharing your insights on object storage in the cloud-native Kubernetes world. And as usual, I would love to have you back on the show, but let’s make sure that this time it’s not one year… much frequently. Thank you.
AB Periasamy: I would love to be back anytime soon.