Guest: Teo Gonzalez
Company: Airbyte
Show Name: An Eye on AI
Topics: AI Governance, Data Sovereignty
Every organization wants to accelerate its AI initiatives, but most overlook the foundation that determines whether those initiatives will succeed: data quality and configurability. In this clip, Teo Gonzalez, Head of AI Business Development at Airbyte, discusses why “you can’t have good AI without great data” and how the definition of “right data” changes depending on context, industry, and architecture. His insights open the door to a broader discussion about the future of AI data infrastructure — and why organizations must modernize their pipelines before expecting meaningful AI outcomes.
📹 Going on record for 2026? We're recording the TFiR Prediction Series through mid-February. If you have a bold take on where AI Infrastructure, Cloud Native, or Enterprise IT is heading—we want to hear it. [Reserve your slot
AI systems are only as effective as the data they operate on. But in today’s enterprise environments, data resides across silos, systems, and cloud environments. Historically, the focus has been on providing humans with unified access to that data for analytics or reporting. Teo explains that AI has fundamentally changed that dynamic. The consumer of data is no longer exclusively human — it’s now AI agents, LLMs, and automated workloads that require structured, contextual, and regulation-compliant data.
This shift forces organizations to rethink what “right data” means. It is not a single definition, nor is it a universal template. The right data depends heavily on business context, regulatory requirements, and industry-specific constraints. A healthcare organization faces limitations around HIPAA-sensitive fields. A financial institution must worry about exposure of PII and data residency restrictions. A product company might need lightweight, real-time data availability rather than bulk ingestion.
Teo emphasizes that this is where configurability becomes critical. Airbyte enables organizations to tailor how data is sourced, formatted, and enriched before AI systems access it. “What is right for someone will look different based off of where they are,” he says, noting that Airbyte’s platform supports multiple deployment models, from fully hosted to open source to hybrid via Airbyte Flex. This allows customers to operate within their own cloud environments, meeting sovereignty or compliance needs while still leveraging Airbyte’s data connectivity engine.
But configurability doesn’t stop at deployment choices. The structure, schema, and metadata around the data matter just as much. AI workloads need more than raw content — they need information about where the data came from, how it is formatted, and what context surrounds it. That includes access hierarchies, sensitive attributes, relationships across systems, and transformations that occurred along the way.
This leads to one of the most overlooked challenges in AI development: metadata quality. Teams often focus on acquiring large volumes of data, assuming that more input equals better model performance. But Teo argues that context is equally essential. Without properly organized metadata, AI agents risk misinterpreting or overexposing information. For example, giving an AI agent access to a generic HR folder may unintentionally grant access to sensitive payroll data if the directory structure does not separate them. The agent cannot distinguish between them unless the metadata clearly defines permissions and categorization.
This is why organizations rushing into AI often face setbacks. They underestimate what is required to ensure their data pipelines deliver the right inputs in the right shape. They may attempt to onboard AI tools only to realize that their systems lack the governance and context layer required for responsible use. Airbyte helps them navigate this by ensuring that data is not only ingested but shaped, structured, and contextualized in ways that align with the organization’s larger architecture.
This data challenge becomes even more pronounced when working with regulated industries or global enterprises. Data sovereignty laws — such as GDPR’s strict data residency rules — require sensitive information to remain within certain regions or clouds. AI systems cannot simply pull data freely across borders. Airbyte enables organizations to keep all data movement within infrastructure they control, giving them confidence that their AI workloads are compliant before they are even deployed.
Zooming out from the clip into the broader interview, Teo also highlights an important shift in the AI landscape: AI-focused startups are beginning to act like data engineering teams. Even small companies building LLM-based products must think about schema evolution, data masking, metadata management, and access control. These used to be responsibilities reserved for enterprise data engineering teams. But AI demands such practices early in the development cycle, because mishandling data can immediately lead to biased outputs, privacy violations, or misaligned model behavior.
Airbyte’s role in this evolving ecosystem is clear: it acts as the connective tissue between source systems and AI workloads. It removes the overhead of maintaining connectors, which Teo calls one of the biggest time sinks for developers. APIs change, schemas evolve, and source systems update. Without automation, engineering teams lose valuable time correcting integrations rather than building AI-driven experiences. Airbyte offers a unified, automated, and configurable platform that ensures data flows securely, consistently, and contextually — exactly what AI systems need to operate effectively.
As AI becomes more central to enterprise decision-making, the cost of mismanaging data grows exponentially. Whether the issue is compliance, biased outputs, or inaccuracies driven by missing metadata, organizations need a stronger foundation than traditional analytics pipelines can offer. Teo’s insights make it clear that the future belongs to systems that understand context, not just content. AI systems need these richer data layers to produce reliable outcomes.
The conversation also underscores the role of open source in accelerating AI innovation. Airbyte’s open-source foundation enables faster iteration, broader adoption, and community-driven testing — all critical in a landscape where AI tools change constantly. Open source ensures that improvements to connectivity, metadata handling, and pipeline configurability can reach teams quickly, regardless of their size or industry.
Ultimately, the clip distills a growing truth across the AI ecosystem: organizations must embrace a modern, configurable data infrastructure before they can expect meaningful results from AI. Without the right data — complete with context, metadata, and compliance-ready controls — AI systems cannot perform reliably or responsibly. Airbyte’s mission aligns directly with this shift, ensuring teams can customize their data pipelines to match the needs of their AI workloads.





