Summary: At the Open Source Summit, a few weeks ago, IBM and the Linux Foundation announced the public availability of Machine Learning eXchange or MLX. Fast forward to KubeCon NA (2021), IBM open-sourced ModelMesh, a core component of Watson. We sat down with Animesh Singh, CTO of Watson Data and AI Open Source Platform at IBM to deep dive into these announcements and updates.
Topics we covered in this show include:
- Animesh explained more about MLX, what it does and how it’s going to help data-scientist and data teams in managing the life-cycle of all of your data and AI artifacts.
- What other AI/ML open source projects is IBM involved with at the Linux Foundation?
- At KubeCon IBM made many announcements including taking KFServing out of Kubeflow and renaming it as KServe as an independent project on GitHub.
- IBM also announced taking the core component of Watson for \ model serving and management, called ModelMesh and releasing it as an Open Source project. He explained the governance of both projects.
- We also discussed if it’s fair to say that just the way we used to talk about digital transformation as the core of modern businesses, it’s now time to start talking about AI/ML as core part of software strategy for modern businesses? Is it fair to say the modern world runs on AI/ML?
Topics: KubeCon NA 2021, AI/ML
About Animesh Singh: As CTO for Watson AI/ML Open Technology for IBM, Animesh Singh serves as Watson Distinguished Engineer and Executive, and is responsible for Watson AI/ML Platform Open Technology strategy, architecture and execution, delivering joint IBM Watson and Red Hat technical roadmap and products, working with Watson customers and partners, leading IBM leadership and engagement in Linux Foundation Data and AI, Trusted AI (AI Fairness, Robustness and Explainability) and MLOps (Kubeflow, ML Pipelines, ML Serving) communities. He also drove the strategy and execution for Kubeflow, OpenDataHub and in products like IBM Cloud Private for Data, Watson OpenScale and Watson Machines Learning. Prior to that, he led the launch of IBM Developer as well as the first IBM public cloud offering and launched initiatives around Kubernetes and Istio, Bluemix (Cloud Foundry).
About IBM: IBM is a global leader in hybrid cloud and AI, serving clients in more than 170 countries. More than 2,800 clients use our hybrid cloud platform to accelerate their digital transformation journeys and, in total, more than 30,000 of them have turned to IBM to unlock value from their data. Guided by principles of trust, transparency and support for a more inclusive society, IBM also is committed to being a responsible steward of technology and a force for good in the world.
Swapnil Bhartiya: Hi, this is your Swapnil Bhartiya. And welcome to a special edition of Let’s Talk about AI/ML for KubeCon. And today we have with us, once again, Animesh Singh, CTO of Watson data and AI open source platform at IBM. Animesh, it’s good to see you after such a long time.
Animesh Singh: Thanks. Great to meet you as well Swapnil. I think last time it was at the Open Source Summit in Vancouver. It has been quite a while, but glad to see you again.
Swapnil Bhartiya: At the Open Source Summit last week, IBM and Linux Foundation announced the public availability of Machine Learning eXchange or MLX. So, I want to talk about that, of course, this is KubeCon here. So there are so many things to talk about, let’s start with MLX. What is it?
Animesh Singh: Yeah, Machine Learning eXchange, right? So this is something which we announced at the Open Source Summit, as you mentioned in Seattle AutoAmerica, a couple of weeks ago. And, it’s focused on being that one single stop, one stop shop for all your data and AI artifacts. That incudes your data sets, models, pipelines, notebooks. One of the things that I… Which we have noticed is, typically when you look at the ML and AI life cycle, typically the three most important pillars are data sets, models, and pipelines. And, you essentially start with a data set and you end with a model, and in-between lies that huge data and AI life cycle, where you go through all the different processes, like data ingestion, data cleaning, feature engineering, and then machine learning part of it, which is, you’re running distributed training hyperparameter optimization, validating your trained models, deploying them in production.
So, pipelines essentially become very, very important and become that third pillar to actually take you from data to models. So Machine Learning eXchange essentially brings all these artifacts, which has data sets, models, pipelines and notebooks in one single place. And it’s not only acts as a marketplace or a central catalog for all these artifacts, it also gives you an execution engine under the covers, right? And by virtue of having that execution engine, you get some added capabilities. So if it is pipelines, you can run those pipelines on top of Machine Learning eXchange. If it is data sets, you can essentially download this total assets on your clusters. If it is models, you can deploy those models, right? So by virtue of being integrated with an execution engine under the covers, you get the capability of a central catalog. Right? So you are eliminating all the duplication and bifurcation different teams, recreating these assets again and again, in different silos. But also, because of the execution engine, you get these added services. So that was the announcement Swapnil.
Swapnil Bhartiya: So as you alluded to the fact that it is part of the LF AI and data. And there are other projects also within the foundation. Can you also talk about IBM’s involvement with other projects within AI and data foundation?
Animesh Singh: Oh, sure. So one of the key things, right? For IBM and as part of the… Not only as part of the strategy, right? But also, as we have been one of the leaders into the, a artificial intelligence and machine learning space, right? We have consciously been working towards doing AI in a ethical and trusted way. And that has been one of the areas where IBM has been mostly focused with Linux Foundation AI and data. In fact, the reason why we joined initially LF AI and data was, we wanted to get together with all these other companies and define not only the principles, in terms of building AI in a trusted and ethical way, but also drive some tools and technologies under the ages of the foundation, so that participating member companies can work with us. Right?
So a lot of these companies are essentially working with us, including Microsoft, DARPA and others like Ericsson, AT&T, where we are essentially building principles, how to build AI in ethical way and also, building tools and technology. So as part of that, IBM contributed three open source projects, AI Fairness 360, which is focused on bias detection and mitigation. Adversarial Robustness Toolbox, which essentially detects adversarial attacks against your models and figures out if it as vulnerable to adversarial attacks. And AI Explainability 360, which explains your model predictions, right? So these are some of the major three projects in the trusted AI space where we are engaging.
Another area where IBM is assembly engaged in is a project called Egeria, which is in the dataset metadata and governance. Because we believe, you cannot build trusted in ethical AI, unless you have a very strong data governance and data metadata or capabilities. So Egeria as a project plays into that space, and they are also… We are working at larger community with companies like ING and others, to actually create that central dataset metadata project around that. So yeah, some of the key projects for us, AI Fairness 360, AI Explainability, adverse of a robustness TJX, et cetera.
Swapnil Bhartiya: Excellent. And when we were talking while I was foundation, and you’re talking about Open Source Summit this week, it’s KubeCon. And once again, IBM, is there your presence there. And once again, you are making an announcement at this event as well. So, it’s the talk about what are you announcing at KubeCon, and what is it all about?
Animesh Singh: Yes, it’s KubeCon this week. And I am very, very excited. I do have a session, essentially talking about this topic and using that and leveraging that session to announce. So there are two parts to it, right? So one in general, as you might be aware, right? IBM has been one of the leaders in the Kubeflow community, right? So there was a project in Kubeflow called KFServing, which was essentially focused on solving machine learning models in production, using serverless technologies. So first part of it is, that’s project has been named to KServe and is moving out of queue flow into its own independent GitHub organization. So essentially, the project has grown and has become central to a lot of other companies which are working with it on… With us, like for example, Bloomberg, Selden, Nvidia and others.
So that is the first part. But the more importantly, what we are announcing, majorly, is that we are moving the core of Watson, for model serving and management, which is called ModelMesh, into open source, right? And while we are moving it into open source, we are also combining it with KServe, to actually create that one single standard for serving machine learning models in production, on top of Kubernetes. So that essentially is announcement, that the core of Watson, called ModelMesh, which is focused on machine learning models in production, is essentially being contributed to open source by IBM. And being combined with the KServe project, to create that one single standard for machine learning models in production.
Swapnil Bhartiya: Yeah. First of all, as you said, you’re taking the core of Watson and releasing it as open source as ModelMesh. So I want to understand a bit about, of course, technical and governance aspect of it. So when you are open sourcing it, where will the project be hosted?
Animesh Singh: ModelMesh, It has been running in production for Watson for the last many years. A lot of the Watson APIs called Watson Natural Language Understanding, Watson’s Speech to Text, they’re all running on top of it. And, one of the major functionalities for Watson ModelMesh has been around extreme scalability, so it can serve hundreds and thousands of models in production, essentially it acts as a Distributed LRU Cache, where the intelligently optimized and thousand of the models, on top of Kubernetes clusters, which are very basic in size. And a lot of the algorithms, et cetera, which are built into it are around that fashion, right? To how to optimize compute resources. For serving these many models. Right? With typically, millisecond response time, right? So that’s the focus and technical aspects of this project.
Now, when we are moving this project out, yes. And moving it in open source now, in terms of the governance, the project is something which we are combining with KServe. Now, if you know about KServe, KServe essentially is a project which started in the queue flow community. And then, just around two to three weeks ago, we worked with Google, and Google has been very gracious in terms of moving that project out of the queue flow. And it has its own independent GitHub organization now. So KServe is, being governed according to the same principles and techniques, which we were using before, that is, multi-vendor participation in terms of the steering committee, in terms of the bi-weekly meetings, and then ModelMesh just joining that particular project.
Now our goal, would be somewhere in fourth quarter or, I’m hoping it’s fourth quarter, that we move this particular combination of these two projects as a single entity into a Linux Foundation. Right? But that’s TBD, right now the announcement is focused on, Watson ModelMesh is moving in open-source and being joined with KServe. And our goal with that, is to create that one single standard for model serving them Kubernetes. Right?
Swapnil Bhartiya: Excellent. So, there is a lot of updates that will be coming, so I’ll be keeping an eye on that. But if I just ask you, there’s KServe community, and then there are, I mean, had owned all these plugs, there are so many communities, there is no single community. But if I ask you, what kind of community are you looking at building or supporting with ModelMesh and the combination of the projects, can you talk about that at this point, or you’ll look at it at that point when the project is kind of finds a home?
Animesh Singh: Definitely. I mean, as somebody said Swapnil, open source exists, so that every problem has to be solved only once. But I’ve done that selection, right? That we are not solving this problem of serving machine learning models in production many times. Right now, majority of the companies or other enterprises we work with, and even on public cloud, they have chosen Kubernetes as the underlying cloud operating system. Right? And there has been a strong need for emerging standards on top of Kubernetes, for serving machine learning models in production. Not only serving machine learning models in production, but being able to monitor and generate those trusted AI metrics we talked about, right? Once your machine learning models are running, are they giving predictions, which are unbiased? Can you explain model predictions? Right? So there’s this whole aspect of monitoring and metrics, with the model serving platform itself, as well.
Right? So our community and our goal with this project and these set of projects is essential, to create that single standardized layer on top of Kubernetes, for serving and monitoring machine learning models in production. Now, one of the things which has already happened, KServe has essentially started a protocol, what we call, V2 protocol for machine learning inferencing. Which has already started becoming a standard, right? So Nvidia’s Triton server essentially follows that standard for model prediction inferencing API, so we call the V2 protocol. Sete–Des ML server, that also follows that particular standard. Right? So, if you are writing applications, which are talking to these V2 protocol APIs, you are essentially align not only with KServe and ModelMesh, but Nvidia’s Triton NT server, Sete–Des ML server, Facebook’s PyTorch: Torchserve.
Right? So Torchserve is as an engine, right? With some PyTorch models, that is also aligning around that standard and exposes, the v2 protocol APIs for modeling inferencing, right? So, the standardization effort already started heavily on that side, right? With KServe. And not with ModelMesh coming in, it is filling that gap that, you need to now, not only serve, like, I would say hundreds of models in production, but now hundreds and thousands of models in production. Because what we realized, over the course of last year and a half was that, more and more models started moving in production. So at some level, the KServe architecture, which is natively tied to Kubernetes, started imposing some of the limitations in terms of the scalability. Because, when you map a model on a per container basis, there are limitations you have that how many containers you can run on a cluster, how many IP addresses can be assigned to those clusters. Right?
To address a lot of those limitations, ModelMesh is coming and filling that big gap. But essentially, yes, it’s still running on Kubernetes, but it’s not mapping a model to a container. And by virtue of using it’s Distributed LRU Cache, and algorithms under the covers, it’s able to serve hundreds and thousands of these models now, on a similar compute cluster.
Swapnil Bhartiya: If you look at today’s… Since we are here at KubeCon, if you look at, the word is, it was running on Linux, then open source and now everything is using Kubernetes a month capacity or another. And then we do talk about the whole stack cloud, whether we talk about security or any other aspect, of course, we can use AI/ML in a lot of other ways as well. But, AI/ML is going to play much more bigger role, than it used to play. So, will it be wrong to say that just the way at some point we said, Hey, no, you need to had a digital transformation or cloud strategy. You need to have a software strategy, you cannot run a business.
Swapnil Bhartiya: Will it be wrong to say that, you should have a AI/ML strategy as well, if you want to be a successful business in the modern times, because without that, all the way from security or whatever metrics you’re collecting, whether to get better insights or if you’re building an AI/ML based solution itself, like Tesla cars or whatever it is. So, it’s a just high-level question that I’m asking you, what is the role of AI/ML in modern word, then you’re building your stack?
You’re definitely right. I mean, in fact, I think there can be a saying, as majority of the organizations have already started taking steps in their direction, right? Where, if you see just in general, in terms of the job posting around, the amount of companies which are either building an AI and machine learning based strategy, and by virtue of that strategy, the underlying platform to create and solve these models in production. But also, the whole data science capability where they are now focusing a lot of emphasis that, Hey, yes, we had a lot of data, but there was no data engineering around it, we were not creating data lakes, we were not being feature engineering, we were not doing data cleansing, because this data is a source of lot of knowledge and we can actually harness that with the available technologies, which are now there to actually make meaningful predictions for our businesses. Right? Which we can help grow the business. Right?
So I think that trend is already on, more and more it’s catching up. And now, you see another layer which is coming up, it’s like the traditional cloud functions of the traditional IT functionals. Right? For example, you’ve talking about security, DevOps. We are seeing, a lot of the companies are now investing AI driven security route. How do you actually use AI to make your security portfolio more robust, more strong, more predictive in nature? How do you actually use AI to build DevOps pipelines, so that it can predict when your data centers machines are going to fail, when the network bandwidth is going to clog, or when the capacity of these machines. So they are leveraging AI in automation, now they are leveraging AI in DevOps, they are leveraging AI in security. So I think the trend is, until this war, there was a huge Jaggerant of, the tools and technologies, how to build AI. And now we are seeing, how to infuse AI, which is how to use AI into other layers of the organization and the technologies which we have.
Swapnil Bhartiya: Animesh, thank you so much for taking time out today and talk about, of course… I look at this as a journey of the projects that are happening at IBM. And of course, as you mentioned, those projects will be taking shape soon. So I’ll, keep an eye on that. And I would love to have you back on the show when the KServe ModelMesh, something is finalized there. But thanks for your time today, and I look forward to our next conversation.
Animesh Singh: Thanks, thanks a lot Swapnil. Thanks for having me here, right? Always exciting to talk to you, right? On this journey, which we are taking in open source, with a lot of these AI projects, right? And I’m as, and… As just part of going, I would like to say that, our focus within IBM has always been on around building AI in a trusted and ethical way. And a lot of these projects, which you are seeing, whether it’s Machine Learning eXchange, which focuses a lot on the data lineage, data governance, having certified artifacts, which are there, or with ModelMesh bringing in a lot of these capabilities. With ModelMesh and KServe around, not only serving machine learning models in production, but also being able to explain model predictions, detect whether the model predictions are biased against a particular group, right? We have always been very conscious of doing this things, in a very ethical way. And also, creating AI in a trusted and ethical way. So, thanks for having me here.