ArticleCloud Native ComputingDevOpsFeaturedNewsroom

How Open Source Alluxio Is Democratizing Data Orchestration


Alluxio is one of the many leading open-source projects/companies – including Spark and Mesosphere – that emerged from UC Berkeley Labs. Haoyuan (H.Y.) Li Founder, Chairman and CTO of Alluxio, sat down with Swapnil Bhartiya, Editor-in-Chief of TFIR to discuss how Alluxio is providing new ways for organizations to manage data at scale with its data orchestration platform.

Alluxio’s data orchestration layer has increased efficiency by four times, so companies are finding that work that used to take one year now takes three months.

For many enterprise companies, the path to the cloud starts with an intermediate step of a hybrid cloud approach, Li said.  He also sees widespread enterprise adoption of a multi-cloud strategy.

From a customer perspective, experience has taught them to be wary of being locked into one vendor, so it is common to see a stack with multiple cloud vendors.  Enterprise companies are leveraging more than one public cloud at the same time, adopting a multi-cloud approach.

Hybrid Cloud

Enterprise companies have put a lot of investment in the software behind their firewalls.  They understand the strategic value of their data and want to leverage their custom environment and keep their data on-premises.

At the same time, they want to take advantage of elasticity available on the compute side of the cloud.  A very typical hybrid cloud model is to keep data on-premises and push their workloads to the public cloud.

Two-tier Fragmentation

Alluxio technology interfaces with the data sector of the stack, and so focuses on data analytics, SQL queries, machine learning, and  AI.

Their clients are companies that digitized earlier than other companies or compared with other industries, mostly in the telecom, e-commerce, internet services, and financial services spaces.

With all the hybrid and multi-cloud environments, the whole ecosystem is quite fragmented, he said.  Alluxio provides an orchestration layer to seamlessly blend the data no matter where it comes from.

Basically, data comes from two layers.  The persistent storage layer, generally found at the bottom of the stack.  The compute or data application layer is on top makes use of the data pulled from the storage layer.

There are so many different ways for companies to leverage their data, said Li, and so many innovations over the past two decades.  On the compute side there is MapReduce, ApacheSpark, SQL Query, and Tensorflow.  Every three-to-five years there’s a new way to work with your compute load bringing new ways of leveraging data.

On the storage side, the innovation cycle is five-to-ten years, bringing new storage systems that are faster, cheaper and easier to use.

All this fragmentation is very inefficient and presents one of the biggest technical challenges today.

Enter Data Orchestration

The Alluxio platform resides between the data application layer and the persistence storage layer. The business teams only need to interact with the Alluxio APIs with the global namespace for all the driven applications so they can have the data on the demand.

Li cautioned that even though there are so many new apps being generated, this data revolution is still at a very early phase.

Listen in to hear Li talk about the importance of open source, how Alluxio got started at the UC Berkeley Labs, how they make the decision about what is added into the code base, and if it goes into the code base or the enterprise edition.

By TC Currie