Alluxio has announced the integration of RAPIDS Accelerator for Apache Spark 3.0 with the Alluxio Data Orchestration Platform to accelerate data access on NVIDIA accelerated computing clusters for computation of both analytics and AI pipelines.
According to the company, validation testing of the integration for caching of large datasets and data availability for NVIDIA GPU processing showed 2x faster acceleration for a data analytics and business intelligence workload. At the same time, NVIDIA GPU clusters with Alluxio demonstrated 70% better return on investment (ROI) compared to CPU clusters.
The integration offers data locality for I/O acceleration: Alluxio manages local storage resources on the GPU cluster and provides a high performance distributed cache to accelerate data access from a remote storage cluster.
To use RAPIDS on GPU enabled clusters and Alluxio for storage access, no code changes are required. This makes adoption of the solution pain free for customers looking to migrate from their existing software stack.
Moreover, multiple data access APIs are supported to enable the use of the most appropriate processing framework for each step of the data pipeline. The distributed cache is shared to allow for high performance even when data moves from one framework to another.
RAPIDS Accelerator for Apache Spark 3.0 with Alluxio Data Orchestration Platform integration is now available.