AI agents are everywhere, but most fail to deliver results when deployed on real business data. The problem isn’t the models—GPT-4, Claude, and Gemini are powerful. The issue lies in the infrastructure connecting these models to reliable, up-to-date enterprise information scattered across Salesforce, NetSuite, Stripe, Google Drive, and internal databases.
Legacy APIs weren’t designed for agentic consumption. Production AI systems need discovery, search, real-time freshness, and governance—capabilities that traditional data pipelines simply don’t provide.
The Guest: Michel Tricot, Founder and CEO at Airbyte
Key Takeaways
- Traditional APIs weren’t built for agents—they lack discovery, search, and operate at human-scale latencies
- Airbyte’s “context store” enables agents to discover, read, and write across data silos autonomously
- Real production AI requires governance, auditability, and synthetic role management at unprecedented scale
- Open source accelerates agentic platform adoption by enabling developers to build and extend infinitely
- The emerging AI data layer prioritizes search and indexing over traditional storage architectures
***
In this exclusive interview with Swapnil Bhartiya at TFiR, Michel Tricot, Founder and CEO at Airbyte, discusses why most AI agents fail in production, how open-source data infrastructure enables agentic systems, and what enterprises need to build autonomous, production-ready AI workflows.
Why AI Agents Fail Without Internal Business Data
Most AI agents today rely solely on public web data, which delivers no competitive advantage. Without access to internal systems like CRMs, ERPs, and financial platforms, agents can only hallucinate—inventing answers instead of acting on real information.
Q: What specific problems did you see in real-world AI deployments that convinced you data infrastructure was the missing piece?
Michel Tricot: “A very simple example is just going to any kind of sales team and telling a sales rep to ask questions about a specific customer. What the agent is going to tell you is whatever it can find online. All of that is just public data—anyone has access to the same amount of information, so you don’t really have an edge compared to another company. But if you overlay all the internal data that you have about a customer, it really changes the value because now you can actually inject your own knowledge about a customer.”
Tricot points to a fintech operations startup Airbyte works with that builds agents for other companies. Without access to NetSuite, QuickBooks, and Stripe, those agents are useless—they hallucinate charges and transactions that don’t exist. Only when real financial data flows into the system do agents become operationally valuable.
Q: How does data usage fundamentally change when it’s consumed by AI agents instead of analytics dashboards?
Michel Tricot: “People don’t do analytics because they like doing analytics. They do analytics because they want to understand their business. The person making the decision is an actual person, so the latency is generally going to be pretty high. When it comes to agents, suddenly you have the promise of automation—you can have a CPU or GPU making the action for you, operating at sub-nanosecond latency. This is why companies are so bullish in adopting agentic systems—to streamline, reduce latency, and increase the throughput of any operation they have internally.”
The Context Store: Rethinking Data Access for Agents
Traditional APIs are designed for UI consumption by humans, not for autonomous agents that need to discover information dynamically. Airbyte introduced the “context store” concept—a new primitive that enables agents to search, discover, read, and write across enterprise data silos without being gated by rate limits, pagination, or rigid API schemas.
Q: What is a context store, and how is it different from traditional APIs or data pipelines?
Michel Tricot: “These APIs have not really been designed and built for agentic consumption, where suddenly you have something that operates really fast and needs to discover the world around them. The context store is about giving agents a way to discover information without doing a sequence of if-then-else statements. It allows the agent to just say, ‘Oh, there’s this piece of information on Salesforce. Let me see what’s happening on the Zendesk side.’ If those interactions are gated by API, you’re just back in the old world of rate limits, pagination, access management—you don’t have search, you don’t have the ability to do that discovery.”
The context store operates on what Airbyte calls the “agentic data loop”: first, agents discover and search across systems; then they read relevant information; finally, they write actions back to those systems—and the loop continues iteratively.
Q: Why can’t agents just use MCP (Model Context Protocol) to access data?
Michel Tricot: “MCPs don’t work when they are just a very thin layer in front of an API, because then you’re just back in the old world. The more an agent is able to cross-reference data across silos, the better the write is going to be. The logic of what data you put into a write is never coming from just one singular system—it’s coming from a multitude of places where data and context is actually stored.”
Governance, Auditability, and Trust in Production AI
If AI agents are making autonomous decisions—approving invoices, updating customer records, or triggering financial transactions—enterprises need the same governance, auditability, and role-based access control they have for human employees. But now those systems must operate at machine speed and scale.
Q: How do you ensure data quality, lineage, and auditability in a way that enterprises can actually rely on?
Michel Tricot: “We need to expand governance to not just humans making actions, but having synthetic roles making these different actions. How do you audit what information was accessed? How do you audit what information was written? You can encode specific rules that your agent is going to rely on—whether they are prompts or deterministic workflows on what the agent can and cannot do. At the end of the day, it comes down to auditing every single action taken by your agent. You potentially need an agent auditing those audit logs to make sure nothing wrong is happening.”
Tricot is candid about operational risk: agents will make mistakes, just like humans. But the advantage is speed—when mistakes are detected, they can be fixed at unprecedented velocity.
Open Source as the Distribution Engine for Agentic Infrastructure
Airbyte built its reputation on open-source data connectors. That foundation is now critical for agentic systems, where every team is being tasked with building agents and needs access to infinitely extensible data sources.
Q: How does Airbyte’s open-source foundation translate into infrastructure for AI agents?
Michel Tricot: “Open source is a very strong distribution mechanism. When it comes to agentic platform shifts, every single team is tasked with ‘How can I adopt an agentic system?’ Most of the time, that task falls on the engineers working on the project. Open source has always been the best way for enabling developers—they are always going to be looking for what already exists, and they fall on an open source project. The other piece is the problem of data connectivity is completely unbounded. The number of places where you might have data is just infinite. Open source is meant for extensibility—giving access and control over where the product is going to connect.”
Airbyte has an internal project called “Hydra”—a fully agentic system that takes all open-source contributions, Zendesk information, and Sentry logs to reshape how connectors are designed for both agent and analytics consumption.
Real-World Use Cases Beyond Chatbots
Enterprises are moving beyond conversational AI into autonomous decisioning, workflow orchestration, and closed-loop systems. Airbyte customers span sales enablement, finance operations, and customer support automation.
Q: Are you seeing customers use this for agentic workflows that go beyond chatbots?
Michel Tricot: “We always think about the modality by which agents access data and the maturity of adoption. Most companies follow a motion similar to self-driving car levels—level one, level two, level three. The agentic data platform is an infrastructure product that solves access, discovery, and writes. How you consume it is different based on your level of maturity. If you want it as part of a chat, you can solve it through an MCP connecting to Anthropic, OpenAI, Perplexity. If you’re looking to build a specific agent that runs as an automated process, you use our open-source SDK or APIs. As you automate more, you build complex workflows chaining different agents, adding predictable and unpredictable tasks with the right gates.”
The Emerging AI Data Layer: Search Over Storage
Vector databases dominated the AI infrastructure conversation in 2023. But Tricot argues that search—not embeddings—is the most critical primitive for agentic systems.
Q: Should organizations change how they store data because of AI?
Michel Tricot: “The thing that is most important today is the ability to search. We had this whole hype around vector databases—at the end of the day, a vector database is an ability to search around unstructured data. An agent needs a very strong ability to search data, to discover what is available to it. That’s where I see a lot of the infrastructure moving toward—it’s not so much about how you store data, it’s how you store it so that it is searchable, so that it is indexed in a way that always provides the most accurate and most valuable information first.”
Watch the full TFiR interview with Michel Tricot here





