AI/MLDevelopersDevOpsFeaturedLet's TalkOpen Source

How Experiment Versioning Is Going To Solve Big Problems Of AI/ML World


Guest: Dmitry Petrov (LinkedIn, Twitter)
Company: (Twitter)
Show: 2022 Prediction Series is building a machine learning (ML) platform on top of a software development stack. Using several open-source tools, they are extending traditional software models such as data versioning control, continuous machine learning (CML), and more. recently announced a new idea called experiment versioning, which Dmitry Petrov, Co-Founder and CEO of, explains as recording all of the changes you make during the modeling process. To this, he says, “When you have a modeling production, for example, you need to know what set of hyperparameters you use for the model, what source code was used, and what version of the data set was used to produce this model.” Petrov adds, “You can always return and change something, retrain the model and have a whole lineage between your data code, hyperparameters and model. That’s the idea.”

But why did the company come up with experiment versioning now and what problem does this solve? To answer these questions, Petrov refers to problems on the modeling and the production side of machine learning. Petrov explains from the modeling side that people working on experiments “cannot understand why they didn’t have the same result and why it’s not reproducible. This breaks this connection between the pieces.” On the production side, they found a lot of their customers saying, “All right, we use some experiment tracking tools, we have models stored in these tools, but they live outside of our code and we have to use a separate set of tools and APIs to store models, to get models.” Those same clients needed a GitHub-based approach, so everything was on the same page.

This new methodology improves the lives of ML engineers and data scientists because it simplifies collaboration between departments. And when your model is built this way, it’s already versioned and on Git. According to Petrov, “You just tell your DevOps, ‘This is my model. This is my Git check sample or Git text.’ It simplifies how you collaborate in the team.” And because of the distributed nature of Git, ML teams will have better control of the experiments so they can run hundreds of experiments per day on a machine (or in the cloud).

The summary of the show is written by Jack Wallen


Swapnil Bhartiya: Hi. This is Swapnil Bhartiya, and welcome to Let’s Talk. Today we have with us once again Dmitry Petrov, co-founder and CEO of Iterative. Dmitry, it’s great to have you back on the show.

Dmitry Petrov: Hi. Thank you for inviting me to the show.

Swapnil Bhartiya: Today, we are going to talk about experiment versioning, but before we do that, you have been on this show a couple of times, but just to remind our viewers, quickly just tell us what is Iterative all about and anything new since we last talked. 

Dmitry Petrov:, we are building machine learning platform, but we are building this platform in a little bit special way. We build platform on top of software developing stack, which means on top of GitHub, GitLab, BitBucket, CI/CD systems, Git itself. This is our approach how to build ML platform. We have a bunch of open source tool to extend the traditional software development tools like DVC, data version control, CML, continuous machine learning, and et cetera.

Swapnil Bhartiya: You folks recently announced experiment versioning. I want to understand, go a bit deeper, how do you define experiment versioning? From my perspective, it’s more or less whatever changes people are doing, what are experiments they’re doing. You can also track them as well, all the metric that you’re collecting, in addition to what is running in production. Please explain what is experiment versioning and what is the importance either for data scientists or engineers. Just talk about it.

Dmitry Petrov: Idea of experiment versioning is to record all the changes you made during the modeling process. When you have a modeling production, for example, you need to know what set of hyperparameters you use for model, what source code was used, what version of data set, what exactly data was used to produce this model. You can always return back and change something, retrain model and have a whole lineage between your data code, hyperparameters and model. That’s the idea.

What is new in experiment we are running, we extend current set of tools, because current set of tools, it usually called experiment tracking. It based on idea of having a service, SaaS or database with a set of hyperparameters, metrics and some links to the source code. During the modeling, you put this information to center of a place and have a table of these experiments.

What we found that, first of all, it’s a useful approach. People, data scientists really like this approach, but the connection between the experiment and source code is constantly breaking, so people losing this connection between source code and experiment. That’s the problem that experiment versioning solves because we put all the information under Git control and you have a whole snapshot of your experiments.

Swapnil Bhartiya: If I’m not wrong, if you compare to … Of course there are already some tools which do help tracking experimentations. You did mention, I think, some of the shortcomings, but in general, if you can talk how this improves the whole workflow for engineers? At the same time, most of the work that you folks do is based on feedback, the need from the community. Also talk about what kind of demand was there that people were struggling with existing tools and you’re like, “Hey, let’s solve this problem.”

Dmitry Petrov: There are a few set of problems, one set on modeling side and another set on the production side. On the modeling side, what is happening, with the existing toolset, you can track your hyperparameters, metrics and links to the source code, but in many cases, people, for example, forget to commit code or they made some code changes, run experiments, record metrics and hyperparameters, but code is not recorded and the connection is lost.

When they return back to the experiment, they cannot understand why, why they didn’t have the same result, and why it’s not reproducible. This breaks this connection between the pieces. When you put everything, when you codify the entire experiments on Git, then there are no way to lose the connection. Everything is versioned. We just use regular best practice from software engineering to manage these experiments.

Another set of problems is production-related. What is happening, people are saying, a lot of our customers were saying, “All right, we use some experiment tracking tools. We have model stored in these tools, but they live outside of our code and we have to use separate set of tools and IPIs to store models, to get models.” Their CI/CD process becomes really special. It’s different from software development tools. Basically they have a separate CI application release process and another set of CI pipeline, continuous integration pipelines for model deployment process.

They’re saying, “We need the same approach. We need GitHub-based approach when everything will be managed through Git, GitHub, so we can have a single CI/CD pipeline for application deployment and model deployment.” Again, using Git and GitHub for managing models and experiments put everything on the same page. Your model becomes the same thing as application. You can unify and simplify your model deployment process. It will do the same as a deployment process as well as applications.

Swapnil Bhartiya: If I’m not wrong, this is also more or less like you’re treating experiment also as code in a way, because when you do talk about Git Hub in one sense. If that is the case, the second part is that, who is going to benefit from … I can be totally wrong. Of course, in today’s world, when you talk about CI/CD pipeline, you’re not talking operators and developers separately or [inaudible 00:07:16], but it does look like it’s more or less solve the problem for data scientist, data engineers or developer side, less about the operator side. Can you talk about both aspect there? One was treating experiment as code, and second is, who is the actual beneficiary of this new feature? 

Dmitry Petrov: With our approach, the major innovation is to codify the ML experiments. Basically, we are saying experiment can be managed as a code. We use this general design pattern, if you wish, which is very popular in DevOps world. Now with codified experiments, ML experiments, you can version and manage them through a regular Git experience, version control experience.

Who benefits from this approach, all the departments who are collaborating together. If you codify your experiments and put it under version control system, everyone who is involved in data modeling process are on the same page, because you have a Git as a centralized place for your models for your experiments.

Machine learning engineers know when a model is stored and how it versioned, DevOps folks know how to deploy this model and how to work with this model, application engineers know how to use this model and application. It puts everyone on the same page in the collaboration, and collaboration becomes more efficient in the team.

Swapnil Bhartiya: How does this functionality change the life of data scientists or data engineers? Of course, you have talked about not using the other tools, using this helps them, but I just want to understand how it improves their quality of life so that they can focus on solving much bigger problem than trying to patch and plug too many tools to just strike their experiments.

Dmitry Petrov: Two aspects how it improves lives of ML engineers and data scientists. First of all, when you use experiment versioning, together with Git, of course, it simplifies your collaboration with the other department, with DevOps, for example, who is responsible for deployment. If your model is built on this way, it’s already versioned, it’s already on Git. You just tell to your DevOps, “This is my model. This is my Git check sample or Git text, so please use.” It simplify a way how you collaborate in the team. That’s first.

Second, distributed nature of the Git helps you with better control of experiments. Now I can run hundreds of my experiments per day on my machine, or maybe in the cloud, have a clear picture of what happened in my machine, and then select particular experiments and share with my team members. That’s a big change comparing to many other approaches when you have a master table with hundreds of experiments, and now everyone is struggling how to find the right one.

Distributed nature of the Git helps you to collaborate better on the team to separate my experiments, my hundreds of experiment that might be too much for my team, and two or three experiments that I really would like to share with the team. That’s how distributed version control help you to collaborate. (silence)

For us, right now, a big focus is managing data set together with labels, because we were always focusing on data management. Data management was always our primary focus in our toolset, and now we see that people need a more granular control not only on data changes, but also on label changes. We are working on label management capabilities in our toolset.

This is actually a prerequisite to a new way of building models, which is usually called data-centric AI, when people iterate on the models, not just by changing your code, you’re changing your architecture of the model, but also by changing your data, data and labels and et cetera. That’s one big push, big priority for near future.

The second big priority is improving user experience. A lot of our tools works really great on the workflow side, on automation side, because most of our tools that come online, they integrate very well between other toolset. It’s really easy to put it as a part of automation, of your building block on your AI platform, but people still need better UI, a better user experience. We are working really hard on of this to provide the best UI possible for data scientists, for engineers who are involved in data projects.

Swapnil Bhartiya: Dmitry, thank you so much for taking time out today and talk about, of course, experiment versioning, but also how it’s solving actual problem for data scientists, data engineers. Thanks for sharing those insights. I would love to have you back on the show again, hopefully this year. If not, then certainly next year. Thank you.

Dmitry Petrov: Thank you. Thank you. Thank you for having me.