How to Close the Operational Gap in Apache Iceberg Pipelines | Christian Romming, Etleap

Christian Romming, Etleap CEO, explains how unified pipeline platforms solve Apache Iceberg's operational gap, enabling real-time data flows and AI use cases.

By Monika Chauhan February 23, 2026

0

Guest: Christian Romming
Company: Etleap
Show: The Agentic Enterprise
Topic: Cloud Native

Apache Iceberg has moved from experimental technology to production infrastructure for modern data teams. The open table format promises platform independence, data sovereignty, and compliance flexibility—benefits that have attracted backing from Google, AWS, Microsoft, Databricks, and Snowflake. But adoption has surfaced a critical challenge: Iceberg defines how tables behave, not how to operate the pipelines around them.

That operational gap is forcing teams into a familiar pattern: stitching together separate tools for ingestion, transformation, scheduling, and table maintenance. When something breaks, engineers find themselves debugging across four disconnected systems with no single layer responsible for the complete data flow.

Christian Romming, Founder & CEO of Etleap, has been building data infrastructure long enough to recognize this pattern. As a former CTO at an ad tech company managing massive data volumes, he saw engineering teams consumed by pipeline plumbing instead of analysis. That experience led him to found Etleap with a clear mission: eliminate the engineering work required to build and maintain data pipelines.

Now, as Iceberg adoption accelerates, Etleap has launched what it calls the Iceberg Pipeline Platform—a unified layer that coordinates ingestion, transformation, orchestration, and table operations as a single end-to-end process.

From Weekly Batch to Real-Time Streams

The data infrastructure landscape has transformed dramatically over the past decade. Cloud data warehousing democratized analytics, bringing costs down from seven-figure budgets to $0.25 per hour with Amazon Redshift. Processing cadences compressed from weekly ETL jobs to nightly runs, then hourly updates, and now real-time streams.

“A couple of years ago, we started hearing a lot more about Iceberg,” Romming explains. “Companies want Iceberg for different reasons—platform independence, data sovereignty, compliance and security. Now that the big vendors have thrown their weight behind it, the question is less about whether Iceberg is the right format and more about how to actually use it in production.”

That shift from “should we adopt Iceberg?” to “how do we operate it?” defines the current moment. Teams have validated the format but need operational tooling that matches its capabilities.

The Brittle Coordination Problem

Modern data stacks often separate ingestion and modeling into different tools, add a third platform for data quality, and orchestrate everything with Airflow or similar schedulers. This fragmentation creates what Romming calls “brittle coordination.”

The classic symptom: introducing artificial delays between pipeline stages. Ingestion should complete by 6am, so modeling gets scheduled for 9am to create a safety buffer. It’s a workaround that masks the underlying problem—these systems don’t actually coordinate.

“By integrating these processes—coordination, orchestration, ingestion, modeling—as one end-to-end process, we can solve a lot of those problems,” Romming says.

Consider data quality testing, which has become standard practice in modern data modeling. DBT Core popularized the pattern of writing SQL-based tests alongside transformation logic—assertions like “this column should only contain non-null values.” But these tests run after data has already been ingested and transformed.

“That’s an example of a rule that should be enforced at ingestion time, not at modeling time,” Romming points out. “By having an end-to-end integrated process, we can stop the pipeline and alert the right people when we see bad data at ingestion time. You don’t have to ingest the bad data first and then solve the problem later.”

This becomes especially important in regulated industries. Financial services companies often work with external data vendors delivering daily Excel reports. If a dollar value falls outside expected ranges, teams need immediate alerts so vendors can fix issues before the data propagates downstream. Discovering problems during modeling is too late.

Table State as a First-Class Citizen

Traditional pipeline orchestration operates on schedules and dependencies. Etleap’s approach centers on table state—specifically, Iceberg’s snapshot mechanism.

Snapshots capture the exact state of a table at a specific moment in time. This enables data lineage that extends beyond “this source feeds this table, which feeds this model” to include the actual data: “the source state was X, the intermediate table state was Y, the model state was Z.”

“You have lineage both in terms of the pipeline itself and in terms of the data,” Romming explains. “For auditing purposes, you can go back and understand what happened, why your model produced a specific result based on the state of the data.”

This table-state-first approach also enables new capabilities. With continuous ingestion to Iceberg, data can lag source systems by seconds rather than hours. That freshness unlocks agentic AI use cases where models need constantly updated information.

“You can have a continuous process from your source—a database or stream—through ingestion into Iceberg and modeling for AI use cases, with data that’s just a few seconds behind the actual source,” Romming notes. “That makes a big difference for AI use cases we encounter today.”

VPC-Native Operations for Security and Compliance

While Iceberg’s open format gets much of the attention, Romming identifies another adoption driver: the ability to run data infrastructure inside an organization’s own VPC.

Enterprise teams frequently use multiple warehouses—Snowflake and Databricks together, for instance. Traditional architectures require separate data copies and duplicate pipelines. With Iceberg, teams can ingest once, model once, and make data accessible to multiple query engines plus services like AWS Bedrock for AI workloads.

“One copy of the data and many use cases is one of the examples there,” Romming says. “Running inside your VPC is also necessary for security and compliance.”

This matters particularly in financial services and healthcare, where regulations and company policies mandate that data never leaves the VPC. Etleap’s architecture keeps data and all orchestration within the customer’s VPC boundaries.

Shifting Right to Data Teams

Looking ahead, Romming sees the platform enabling what he calls “shift right”—putting more capability directly in the hands of data analysts and scientists rather than requiring upstream engineering support.

“Give the data workers—people building data products, doing data science and analytics—more tools so they have lower reliance on upstream engineers and dev teams and can do more themselves,” he explains.

The combination of Iceberg’s adoption as the standard format and AI’s explosive demand for clean, timely data creates both urgency and opportunity. Etleap’s bet is that data teams will choose integrated platforms over stitched-together tool collections when the operational burden becomes clear.

“There’s a lot to do there,” Romming acknowledges. “We’re going to follow our North Star, which is to serve data teams and remove the engineering cycles spent on data plumbing.”

You may also like

AI Agents Are Breaking Security: Why Production Context Is the Missing Link

By Monika Chauhan3 days ago

Security

Why Static Code Scanners Fail at Runtime—And What Security Leaders Should Do | Joe Sullivan, Joe Sullivan Security | TFiR

By Monika Chauhan3 days ago

Security

API Security in 2026: Why AI Security Is Fundamentally API Security

By Monika Chauhan3 days ago

Security

Agentic AI Apps Are Stuck Waiting on Data—Here’s How to Fix It | Prenil Kottayankandy, Akamai & Zeke Dean, Redpanda | TFiR

By Monika Chauhan4 days ago

AI Infrastructure

2026 Networking Predictions: AI-Native Networks, Edge AI, and the Open Source RAN Revolution

By Monika Chauhan5 days ago

Cloud Native

How Enterprises Stop Breaches with Automated Attack Surface Management

By Monika Chauhan6 days ago

Security