Alluxio recently joined the Presto Software Foundation. Alluxio CEO, Steven Mih, appeared our YouTube show Let’s Talk. Here is a lightly edited version of that interview.
Swapnil Bhartiya: Let’s start with a bit about the Presto Software Foundation. What is it about? What do it do?
Steven Mih: The Presto Software Foundation is a project hosted under the Linux Foundation. It was created last year by companies like Facebook, Twitter, Alibaba and Uber. Alluxio is an open source project that is commonly used with Presto, the open source distributed SQL query engine, as well as other projects like Spark and TensorFlow. We support all these different frameworks. And since this was a foundation that was open to all, we decided to join it as one of the companies involved in that foundation.
Swapnil Bhartiya: If you look at the goals of the foundation, what value does Alluxio bring to it?
Steven Mih: The Linux foundation projects are all about open source, it’s helping grow the communities of these projects. With the Presto Foundation being hosted under the Linux Foundation, we work in an open source way to help develop the community and increase the adoption of the Presto project.
Alluxio is often used under Presto, so the value we bring is around accelerating the data to that. We recently developed a preview which now allows users to transform the data into the format that Presto is looking for. So we’re pretty excited about those things and we’ll be talking about that at PrestoCon that’s coming up at the end of March.
Swapnil Bhartiya: Can you also explain how people, companies, developers using Alluxio with Presto and also give example of some of the major use cases that you can talk about openly?
Steven Mih: One of the big use cases is that Presto is designed to query anything anywhere. It has connectors to different data sources, which can be in remote places. That’s where Alluxio is co-installed with Presto workers which allows users to make that data to be available and local. The result of that is extremely high performance.
In today’s customer environment, they oftentimes are doing more multi-cloud or hybrid and they have data in different sources. There could be data on prem. They can’t necessarily get to the cloud yet, or vice versa. There may be S3 buckets somewhere that they need access to. Alluxio makes all of that seamless for the Presto users.
Swapnil Bhartiya: Can you elaborate that a bit?
Steven Mih: You can now have a much local and higher performing system because the data is now cached locally to the Presto clusters. What it means for data in remote places is that the data infrastructure becomes a lot simpler. Without Alluxio with Presto, you’d have to copy that data and make different silos. The copies of that data need to be synchronized; it needs to be maintained. Users end up having a pretty big data wrangling challenge. We call it the PAS stack, Presto, Alluxio and S3. That stack is becoming much more common now as users can add S3 to it, they can add HDFS to it in remote places and it just operates at a much higher level as if it’s local and very high performance.
On top of that, we’ve added even more to this in our developer preview. We’ve added a catalog service as well as transform operations and we are really excited about how that adds to the picture.
Swapnil Bhartiya: And as you said, the PrestoCon is coming up. Is it co-hosted with some other Linux Foundation event or is it an individual conference?
Steven Mih: This one is a standalone one day conference. It’s happening on March 24, 2002 at San Mateo Marriott San Francisco Airport. It’s a conference for the developers, for people who are working with data. Anyone can attend it for the day to hear talks from companies like Facebook about how they use Presto at scale. We’re sponsoring the show, so we’ll have a talk about the PAS, Presto and Alluxio, stack. It’s a really great event for networking and learning how other companies are using it.
About Alluxio: Alluxio is the developer of open source data orchestration software for the cloud. Alluxio moves data closer to AI and machine learning compute frameworks in any cloud across clusters, regions, clouds and countries, providing memory-speed data access to files and objects. Intelligent data tiering and data management deliver consistent high performance to customers in financial services, high tech, retail and telecommunications.
Alluxio is in production use today at seven out of the top ten internet companies. Venture-backed by Andreessen Horowitz and Seven Seas Partners, Alluxio was founded at UC Berkeley’s AMPLab by the creators of the Tachyon open source project.