Cloud Native ComputingDevelopersDevOpsFeaturedOpen SourceT3M: TFiR Topic Of The MonthVideo

Real-Time Stream Processing With Decodable

0

Guest: Eric Sammer (LinkedIn)
Company: Decodable (Twitter)
Show: TFiR: T3M

Decodable is a stream processing platform that connects to source systems, acquires and processes data for a particular use case, and then writes that result into everything from event streaming systems to warehouses, data lakes, and real-time OLAP database systems.

In this episode of TFiR: T3M, Decodable CEO Eric Sammer shares his insights on the current data processing trends and how Decodable’s stream processing platform is helping companies enhance their overall data strategy.

Highlights of this video interview:

  • Real-time data has been around for quite some time, but retail and supply chain use cases such as package tracking, delivery tracking, inventory management, and customer engagement have brought Kafka, Apache Flink, and other open-source projects to the fore.
  • Decodable is powered by Apache Flink, which is a 10-year-old project that came out of a university in Berlin, and Debezium, which is a change data capture product for pulling data out of relational databases, operational databases for things like replication, and in Decodable’s case, for stream processing.
  • Decodable chose Apache Flink because it is the industry standard for real-time stream processing. It has been the foundation of some of the most sophisticated real-time businesses such as Uber, Lyft, and Netflix.
  • The more robust the open-source project is, the bigger the community and ecosystem around it. From the customer’s perspective, this is a good thing because they’re not locked into one vendor. For Decodable, it creates this enormous ecosystem of connectors of adjacent technologies that allow it to better serve the use cases that customers are looking for.
  • It is also active in other open-source project communities that are critical to Decodable like file formats and other de facto standards in data warehousing and data platforms, in general.
  • For Decodable diverse stream processing, ETL, ELT and a single platform sits between all the different systems, acting as a “network” between different applications and hosts, greatly reducing the complexity.
  • For business continuity and disaster recovery and availability, the durability of this data is important, i.e., once it’s captured, is it going to be around if the lights go out or if an entire cloud availability zone or weekend disappears. Decodable Capturing the data and persisting in a way that is resilient to temporary failures, and then facilitating the transfer and fanning out of that data from one cloud region to potentially multiple cloud regions.
  • Decodable reliably captures data and then not just processing it, but moving it around between various availability zones and regions. It is a system, but it is not a backup or disaster recovery solution. It can be part of a larger strategy of dealing with the loss of data or centralizing parts of data for analysis.
  • At least half of the challenge around data infrastructure is a people problem, not a technology problem. There is a big shift towards self-service, i.e., enabling the people who need to do the analysis to be able to acquire the data that they need when they need it, assuming they have the rights to do so, without having to hop between team boundaries, file tickets, and fill out forms.

Advice for companies looking to improve their data strategy:

  • Start at the end and work backwards. What do you want to get out of this? What is the cost savings or revenue growth opportunity of whatever it is that you’re measuring?
  • It can get overwhelming when you look at the stack of technologies that some people are advocating for in order to just get off the ground with the simplest use cases. Rally around a small set of powerful primitives to have a data platform that consists of real-time ingestion processing, some kind of analytical database or system for historical storage and processing of data, and then the data quality and governance tooling that you need around that.
  • Understand how all those pieces fit together, and how the team is going to work with those tools.
  • Enable self-service. Make sure that people are clear about what the goals are so they can make reasonable trade-offs around all the different attributes (data quality, timeliness, etc.).
  • The cost versus risk profile tradeoff depends on the criticality of the data and the use cases. Have a framework for thinking about these kinds of things.

This summary was written by Camille Gregory.