AI Infrastructure

Why AI Agents Fail Without Internal Data and How to Fix It | Michel Tricot, Airbyte | TFiR

AI agents hallucinate when they lack internal data. Michel Tricot, Founder and CEO of Airbyte, explains why data infrastructure is the missing layer in production agent deployments.

By Monika Chauhan June 1, 2026

0

AI agents that rely solely on public web data cannot differentiate one business from another. Every competitor’s agent has access to the same information, which means agents built without internal data pipelines produce generic outputs, hallucinate facts about customers and transactions, and cannot execute real business processes autonomously.

In this interview on TFiR, Michel Tricot, Founder and CEO at Airbyte, breaks down why data infrastructure is the missing layer in production AI agent deployments, how open source data connectivity enables agentic adoption at scale, and what fundamentally changes when the same data used for analytics must now serve autonomous agents operating at machine speed.

Guest: Michel Tricot, Founder/CEO at Airbyte
Show: TFiR

Here is what every data engineer and AI platform team needs to know.

Technical Deep Dive

Q: How does open source data integration translate into infrastructure for AI agents?

Michel Tricot, Founder and CEO at Airbyte, explains that open source functions as a distribution mechanism that accelerates developer adoption during platform shifts. When engineering teams are tasked with building internal agent systems, they default to open source projects because those tools are already proven and available. Data connectivity is also an unbounded problem, meaning the number of sources an agent may need to connect to is effectively infinite, and open source is built for that kind of extensibility and community-driven expansion.

“Open source is meant for extensibility. It is meant for giving access and control over where the product is going to be connecting.” — Michel Tricot, Founder/CEO, Airbyte

Q: What real-world problem convinces teams that data infrastructure, not models, is the missing piece in AI agent deployments?

Tricot points to sales use cases as the clearest example. An agent asked about a specific customer will return only public data, giving every sales team at every company the same output and no competitive advantage. When internal data, such as how a customer has used the product, which integrations they have built, and how they originally signed up, is injected into the agent’s context, the quality and specificity of the output changes entirely. For finance operations teams, the problem is even more acute: an agent without access to NetSuite, QuickBooks, or Stripe simply cannot complete the task and will hallucinate transaction details instead.

“If there is no data to back it, it is just hallucination. When you start overlaying your internal data, the agent is working out of real data and real information.” — Michel Tricot, Founder/CEO, Airbyte

Q: What fundamentally changes when data used for analytics is now serving AI agents making real-time autonomous decisions?

Tricot draws a direct contrast between human-time decision-making in analytics and machine-time decision-making in agentic systems. In analytics, a human reviews the data and makes a decision, so latency is measured in hours or days. Agents operate at sub-nanosecond latency, executing actions autonomously with a CPU or GPU in the loop. This shift in operating speed is why companies are investing heavily in agentic systems: the promise is a dramatic increase in operational throughput and a reduction in latency across any business process that can be automated.

“When it comes to agents, you have the promise of automation, meaning a CPU or GPU making the action for you, operating at sub-nanosecond type of latency.” — Michel Tricot, Founder/CEO, Airbyte

Q: Why is data connectivity described as an unbounded problem for AI agent infrastructure?

Tricot describes the data source landscape as effectively infinite: teams know what systems they need today but cannot predict what new tools or platforms will appear tomorrow. Agents require connectivity to thousands of different systems, and that list grows continuously as new SaaS tools, databases, and APIs emerge. Open source addresses this because the community can build and contribute connectors for new systems without waiting on a central vendor, making extensibility a core architectural property rather than a product roadmap dependency.

“The number of places where you might have data sitting somewhere is just infinite. You have what you know today, but you don’t know what’s coming tomorrow.” — Michel Tricot, Founder/CEO, Airbyte

Q: What is the parallel between analytics data infrastructure and AI agent data infrastructure?

Tricot draws a direct analogy: companies adopted analytics not because they enjoyed building pipelines but because they needed visibility into their own business operations. Agents have the same dependency. An agent operating without internal data is as blind as a business running without analytics. The difference is that agents act on that data autonomously and at machine speed, which raises the cost of missing or incorrect data from a bad dashboard to a bad automated decision.

“If you don’t know what your business is doing, you’re blind. It is the same thing for agents.” — Michel Tricot, Founder/CEO, Airbyte

Resources & Documentation

Airbyte, open source data integration platform for connecting internal data sources to analytics and AI agent pipelines
Airbyte on GitHub, source code and community connector catalog for extensible data connectivity

***

👇 Click to Read Full Raw Transcript

Swapnil Bhartiya: Michel, as we all know that airbyte built its reputation around open source data integration and connectors. How does that foundation translate into infrastructure for AI agents to solve the problem that we just discussed?

Michel Tricot: The first thing is really open source is very much is a very strong distribution mechanism. And this is something that has really made us successful in the analytics space and even more when it comes to agency can platform shifts. If you think about it, every single team is tasked for how can I adopt an agency system, how can I build an agent internally? And most of the time that task falls on the engineers working on the project. And open source has always been the best way for enabling developers. They are always going to be looking for what already exists, what is the state of the art. And in general they fall on an open source project, they fall on an open source product. And that is why being open source, our core really helps on adoption and the development of the platform and just enabling all these users that are building this new system to just go as fast as possible and deliver value as fast as possible. I would say the other piece is the problem of data. Connectivity is a completely unbounded problem. The number of places where you might have data available data sitting somewhere is just infinite. And you have what you know today, but you don’t know what’s coming tomorrow. And this is a place where open source actually succeeds. It is meant for extensibility, it is meant for giving access and control over where the product is going to be connecting. And this is something that has always served us and our community really well. And it’s going to be the same when it comes to agency system. Like all these agents need to connect to thousands and thousands of different places. And they can only succeed if they have the ability to to build and connect to all these new systems that are appearing.

Swapnil Bhartiya: Let’s talk about this specific problem that we’re talking about today, which is about, you know, companies have ton of data, they have ton of models, but the connection between and also just because you have connected something doesn’t mean AI. I’m a heavy user, I know it hallucinate, it gets lazy. Also it will just read one line of the prompt and it will skip everything else. Can you talk a bit about, as you have seen, a lot of companies are building their AI agents today. What problem did you see in real world deployments, not hypothetically, where we talk about blocks that you connect mcp, everything will work magically. It doesn’t. What problem did you see in real world deployments that convince you that data infrastructure not models was the missing piece there?

Michel Tricot: I mean a very simple example is just going to any kind of sales team and tell a sales rep to just ask question about a specific customer. Very simple. What the agent is going to tell you is whatever it can find online. Now it’s great, like all these chat system have access to web search, et cetera, et cetera, but all of that is just public data. Like anyone has access to the same amount of information. So you don’t really have an edge compared to another company trying to sell a product. Now if you overlay on top of that all the internal data that you have about a customer of a prospect, it really changes the value because now you can actually inject your own knowledge about a customer. So if you, I mean I can tell you like even in, in my team we have people that go through sales serve, we have people who come from open source, we have people who come just through a talk to sales form. Like all this data is not available to the model. So the model is just going to give you the data that everyone has and it’s not going to make your salesperson more relevant or more lethal than another one at another company. But when you bring all that data together, then suddenly you can say oh, this user has been connecting 50 different systems with open source. Oh, and they’re using all these stacks of application that they are bringing and using airbyte for. Oh, and they actually try to sign up on the salesurf product and suddenly the conversation and the context that you can bring to the conversation really changes how the perception of the person in front of you, it changes the angles that you want to take into the conversation. And this is something that is becoming more and more important. And here we’re just talking about place where a human is involved in the loop. But imagine that you’re building a whole process for managing and this is also something that our customers have. Like we are, we’re working with a fin finance ops startup and well, they’re building agents to enable other companies. But if this agent doesn’t have access to NetSuite, doesn’t have access to Quickbook, doesn’t have access to Stripe, like the agents basically cannot do anything. It’s just going to give you and that’s why it hallucinates. It’s just going to say oh yes, this person had this charge on their account, but if there is no data to back it, it’s just hallucination. But when you start actually overlaying your data, your internal data on top of it, Then suddenly the agent is working out of real data and real information. And that is really like where the, the, the problem is when you want to productionize these agents, an agent that is just searching the web is kind of useful, but it doesn’t really help your business. And that’s what people want from agents. That’s the promise of AI is just, it’s going to give you more, is going to reduce latency, is going to improve your operational throughput, it’s going to make you more relevant in anything that you’re doing. And for that you just need the data that’s been sitting in your system. And we can make a parallel with analytics. Like if you don’t know what your business is doing, you’re blind. And it’s the same thing for agents.

Swapnil Bhartiya: And since you bring in analytics, and if I recall correctly, you folks historically have focused on data movement internization for analytics. What fundamentally changes when the same data is now being used by AI agents for making decisions or taking actions, sometimes in real time, sometimes autonomously.

Michel Tricot: If we do a little bit of history, you go for analytics. People don’t do analytics because they like doing analytics. They do analytics because they want to understand their business. They need to understand where there is gaps, where they need to double down in investments. And in general, the person who is making the decision is an actual person. And when it’s a person, then the rate at which you can make this decision is human time. So the latency is generally going to be pretty high. When it comes to agents, suddenly you have the promise of automation, meaning that you can have a CPU or GPU making the action for you. So operating at sub nanosecond type of latency. And this is what agents are looking to solve and why companies are so bullish in adopting this agency system, is to give them this ability to just streamline, reduce the latency, increase the throughput of any kind of operation that they have internally and, and gain massive speed from that.

You may also like

Why DDoS Attacks on Banks Last Longer and APIs Are the New Front Line | Steve Winterfeld, Akamai | TFiR

By Monika Chauhan5 hours ago

Why AI Coding Agents Fail in Jupyter Notebooks and How Jupyter AI Fixes It | Lahari Chowtorri, Amazon | TFiR

By Monika Chauhan1 day ago

AI Infrastructure

How to Route AI Inference Across Latency, Cost, and Model Fit Simultaneously | Ari Weil, Akamai | TFiR

By Monika Chauhan1 day ago

AI Infrastructure

Why HA Failover Fails: Overlooked Application Dependencies and Untested Runbooks | Matthew Pollard, SIOS Technology | TFiR

By Monika Chauhan4 days ago

Cloud Native

Why AI Inference Costs and Vendor Lock-In Are Now Your Biggest Infrastructure Risk | Swapnil Bhartiya, TFiR

By Monika Chauhan4 days ago

AI Infrastructure

Why AI-Generated Code Needs a Cloud Sandbox to Be Trustworthy | Waldemar Hummer, LocalStack | TFiR

By Monika Chauhan4 days ago

Cloud Native