Guest: Teo Gonzalez
Company: Airbyte
Show Name: An Eye on AI
Topics: AI Governance, Data Sovereignty
AI is only as powerful as the data behind it, yet most organizations are still trying to understand what “right data” even means in an AI-first world. At KubeCon + CloudNativeCon, I sat down with Teo Gonzalez, Head of AI Business Development at Airbyte, to unpack the connection between data architecture, metadata, and emerging AI workloads. His insights reveal a fast-changing reality: AI is reshaping how teams think about data, not the other way around.
The shift to AI has forced every organization to revisit its relationship with data. In most companies, data infrastructure was built for analytics and reporting. AI changes the stakes entirely — instead of insights, teams now need action, speed, and context. Gonzalez sees this challenge up close through Airbyte’s work with enterprises modernizing their pipelines. As he puts it, “You can’t have good AI without great data,” and the definition of great data is expanding quickly.
Airbyte has always focused on ensuring organizations can access all their data from across silos, systems, and formats. But with AI models and agents becoming heavy consumers of that data, the company is now expanding its mission. The end consumer is no longer only a human analyst. Increasingly, it’s an AI system that must understand not only the data itself but also its metadata, lineage, access rules, and context. As Gonzalez explains, the nature of the consumer defines the nature of the data that must be delivered.
Understanding what the “right data” means is not a one-size-fits-all problem. The needs of a regulated enterprise look different from a startup building an AI chatbot. Some organizations must adhere to strict data sovereignty requirements. Others prioritize latency or schema consistency. “Configurability is at the heart of meeting customers where they are,” Gonzalez said. Airbyte’s platform supports multiple deployment models — fully hosted, open source, or hybrid through its Flex product — so teams can maintain control of their data based on compliance or architectural needs.
But Gonzalez emphasizes that configuration does not end at deployment. The context around data is equally important. He pointed out a scenario many teams miss: granting an AI agent access to a folder of HR playbooks might inadvertently grant access to payroll data if it sits under the same hierarchy. “It’s not only about the content. It’s the context that’s super important,” he said. Without proper metadata design and access controls, organizations risk exposing sensitive information to systems that are not supposed to consume it. Once an AI model ingests data, you cannot simply “put the cat back in the bag.”
For many companies experimenting with AI, the biggest challenge isn’t model accuracy or GPU availability — it’s the lack of clarity around data governance and pipeline hygiene. Organizations rush to plug AI into everything without establishing what data should and shouldn’t be accessible. Airbyte’s tooling helps solve this problem by standardizing schema, understanding source formats, and enabling organizations to manipulate data in ways that match their own architecture. It ensures the data is not only available, but available with the correct structure, fields, and metadata so downstream AI workloads behave reliably.
The conversation also surfaced a point often overlooked in AI discussions: developer time. Maintaining connectors, especially in complex systems, is a significant time sink for engineering teams. APIs change constantly, and keeping pipelines stable requires continuous updates. “You do not have the time to be focusing on every way that Zendesk might change their API,” Gonzalez noted. Airbyte removes this burden so teams can focus on building AI-driven applications, not maintaining brittle integrations.
This developer-productivity angle becomes even more important as AI workloads become more sophisticated. It is no longer enough to simply dump all data into a warehouse and hope the model figures it out. AI agents require not only accurate data but contextually relevant data. This may include selective ingestion, contextual enrichment, or schema-aware retrieval — capabilities Gonzalez sees evolving rapidly at Airbyte. Today, the company excels at bulk data movement, but tomorrow it may help teams retrieve targeted subsets of data based on AI-driven queries and use cases. The future of data ingestion, in Gonzalez’s view, will be more granular, intelligent, and demand-driven.
The intersection of AI and data engineering is also becoming more apparent. Gonzalez has noticed that new AI companies are starting to think like data engineers, even if they don’t come from that background. They now need to care about schema evolution, masking sensitive fields, and tracking metadata changes — concerns traditionally handled by analytics teams. AI startups suddenly find themselves needing enterprise-grade data practices because any misstep could cause incorrect model behaviors, regulatory issues, or brand damage.
Open source continues to play a major role as AI evolves. Airbyte was built with an open-source foundation, and Gonzalez believes this is essential in the AI era. Community-driven innovation means companies get faster feedback, broader distribution, and a healthier ecosystem. “The power of a community is unparalleled,” he said. In a field where technologies change week to week, rapid iteration and user feedback become invaluable.
The conversation also touched on data sovereignty — a rising concern as countries introduce their own rules around data access, privacy, and acceptable model behavior. Different regions have different cultural norms, regulatory needs, and expectations from AI systems. Airbyte addresses these concerns by ensuring customers can run the platform within their own controlled environments, whether through Flex or open-source deployments. That allows teams to move data while still respecting the legal and cultural constraints of each region. Airbyte does not manipulate or modify data to remove bias — that work remains with the customer — but it ensures they can do so safely within their own infrastructure.
Overall, Gonzalez’s insights paint a picture of a fast-moving industry where data infrastructure and AI architecture are becoming inseparable. The old world of batch pipelines designed for dashboards is giving way to real-time, context-rich systems designed for agents that take action, answer questions, and automate workflows. Airbyte’s evolution mirrors this shift: from pipeline automation to context-aware data delivery designed for intelligent systems.
AI adoption will continue accelerating, but without strong data foundations, organizations will not see meaningful results. As Gonzalez summarized throughout the conversation, “You need data at the core of everything.” It is not enough to have large models or fast GPUs — the true differentiator is high-quality data, enriched with the right metadata, governed with the right controls, and delivered through flexible architectures.





