Guest: Adit Madan (LinkedIn)
Company: Alluxio (Twitter)
Show: TFiR: T3M

Storing data in the cloud is cheap. Cloud providers do this to incentivize enterprises to move all their data to the cloud so that they can use the different compute services that they provide. However, every time the data moves across regions of the cloud, or the data moves out of the cloud (when accessed by on-premise data centers or by a different cloud), cloud providers charge an egress fee based on the amount of traffic that moves across the network.

In this episode of TFiR: T3M, Adit Madan, Director of Products at Alluxio, shares his insights on data egress fees and how companies are dealing with the rising costs.

Current trends in the market:

  • A lot of organizations are kicking off their AI/ML initiatives not just to stay competitive, but for survival.
  • The rise of specialized systems to tackle the high throughput and performance requirements of GPUs.
  • Data consumption is no longer just for analytical purposes, but for model training and deep learning and machine learning as well.

Importance of managing egress fees:

  • The concept of egress fees does not apply to all enterprises; it depends on the scale and the complexity.
  • If an organization goes through mergers and acquisitions and their data are naturally siloed across different locations, then there is the need to worry about egress fees because everyone wants to get insights or drive revenue from the entire collection of data that they have.
  • If an organization is ramping up its AI/ML initiatives, then there is a need to pay attention to egress fees because you might want to compute data away from where it’s located.

Typical approaches companies take:

  • Some companies keep all of their data and operations in one place using one cloud vendor.
  • If a company wants to use services from another cloud, e.g., the availability of GPUs, they will manually copy data across these two environments. They prefer copying because direct access incurs egress fees. In model training, for example, the same piece of data may need to be used 100 times. The downside of making copies: you need a team of 4 or 5 to maintain this, and it is error prone.

Advice for companies looking to reduce egress fees:

  • Eliminate data redundancy in your platform. Avoid having specialized systems doing specific purposes, i.e., the same piece of data is being copied and consumed for different workloads.
  • Aside from finding the right tools, make sure you align and evolve your company culture as well: how you form your organization, how you embrace open source, how open is the technology stack you’re using, how you structure your teams to specialize on different aspects.
  • Be aware of what makes sense for your organization at different times. It may make sense for you to use a completely bundled-up, out-of-the-box service at one point, and then you may want to migrate to something more open over time.

Alluxio’s data orchestration platform:

  • Sits between different kinds of compute frameworks. It is a compute-agnostic, storage-agnostic, and cloud-agnostic solution for big data and machine learning applications.  
  • Using data lakehouses, it is able to serve a single copy or a single data source to different application types without having to move to a specialized system.

This summary was written by Camille Gregory.

You may also like