What is Omnigents and why did Databricks develop it?

Omnigents is Databricks' initiative for their 'Agent Cloud,' addressing the need for better agent development, collaboration, and management. It addresses challenges like switching models, sharing sessions, and security, aiming to be a universal harness for agents.

How does Databricks ensure security and control within its AI platforms like Omnigents?

Databricks emphasizes stateful, contextual policies that track session states and user actions, moving beyond simple allow/disallow rules. This allows for more nuanced security, balancing usability with protection against risky operations and managing costs with spending caps.

What is the significance of open-sourcing Omnigents?

Open-sourcing Omnigents allows for a broader ecosystem where developers can contribute integrations and customize the platform. This fosters network effects, similar to Spark's success, and enables collaborative development of agent harnesses.

What are OLTP and OLAP databases, and what is Databricks' approach to unifying them?

OLTP (Online Transaction Processing) databases handle row-oriented transactions, while OLAP (Online Analytical Processing) databases are for reasoning on large datasets. Databricks' 'El TAP' approach aims to unify storage layers, allowing analytics to directly access data in a column-oriented format via a single storage layer, eliminating data synchronization pipelines.

How did Databricks manage to develop a new database engine from scratch without falling into the 'second system syndrome'?

Databricks mitigated the 'second system syndrome' by hiring experienced engineers and building a 'factory' approach. This factory uses machine learning models trained on vast amounts of trace data to predict and select optimal algorithms and data structures for various workloads, rather than relying solely on academic papers.

What are the key differences between Databricks and Snowflake?

The core differences lie in Databricks' commitment to open formats (Parquet, Delta, Iceberg) and its stronger integration with AI and machine learning use cases since its inception. Snowflake historically focused on proprietary formats and optimized for serving smaller, curated datasets quickly for business users.

What is Databricks' strategy regarding developing its own AI models versus using external ones?

Databricks focuses on making models useful, particularly for querying data (like with their Genie agent), rather than solely training frontier models. While they do train specialized models (e.g., for document parsing) and open-source models like DBRX, their platform is built to integrate and customize various models effectively.

How does Databricks view the role of data and context in the future of AI?

Databricks believes that domain-specific data and context are crucial, becoming the 'new oil.' AI makes this data more valuable by enabling agents to automatically identify issues, provide insights, and build powerful new systems, like a data company building optimized new database engines.

What are the main challenges in scaling technology solutions from tech companies to traditional enterprises?

Key differences include governance, security, data privacy, and legacy systems in enterprises. Additionally, tech companies often have a DIY ('build it yourself') culture, whereas enterprises prefer not to be in the business of building infrastructure, seeking reliable managed services.

Key Moments

The Agent Cloud: Databricks’ Bet on the Future of AI — Matei Zaharia and Reynold Xin

Latent Space Podcast

Science & Technology7 min read71 min video

Jun 24, 2026|840 views|37|3

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

Databricks is betting on an "Agent Cloud" requiring robust data infrastructure, but open-source agent frameworks create a new "modern AI stack" with potential for rapid innovation and security challenges.

Key Insights

Databricks' Omnigents aims to unify agent development by providing a common API across various models (Codex, Claude, OpenAI), enabling collaboration and portability.

The "Agent Cloud" concept is underpinned by Databricks' ambition to act as a full data-and-AI operating system, extending from data ingestion to model deployment and governance.

Databricks open-sourced Omnigents to foster a network effect, similar to Spark's success, encouraging community contributions and integrations.

The development of L-TAP (a unified OLTP and OLAP database engine) was driven by the observation that existing database architectures are over a decade old and often rely on inefficient workarounds like Change Data Capture (CDC).

Omnigents introduces "stateful or contextual policies" to address security and cost concerns, moving beyond simple allow/disallow rules to dynamic policy enforcement based on session state.

Databricks' "Dream Engine" project is a ground-up rewrite of their database engine, aiming to leverage machine learning models trained on vast datasets of query traces to optimize performance across diverse workloads.

The rise of the agent cloud necessitates a new operating system for data and AI.

The conversation introduces the concept of an "Agent Cloud," an evolution beyond traditional software paradigms. Matei Zaharia and Reynold Xin of Databricks posit that as AI models and agents gain advanced reasoning capabilities, much of traditional software will be rewritten. The core tenet is that if data is properly organized and accessible, powerful AI agents can generate significant value. However, the success of this paradigm hinges entirely on the quality and accessibility of the underlying data, acting as the critical foundation for any AI-driven innovation. Databricks aims to provide this foundational layer with its integrated data and AI operating system.

Omnigents: Unifying agent development and deployment.

Databricks' Omnigents initiative is presented as a solution to the growing complexity of agent development. Engineers are increasingly building custom workflows with multiple agents and UIs, often facing challenges with switching models and lacking essential collaboration features like session history and search. Omnigents aims to address this by providing a unified platform with a consistent API that abstracts away model differences, allowing developers to focus on agent logic. This approach enables portability across different environments, from local development machines to cloud sandboxes. The inspiration for this stems from both internal efforts, like the "Isaac" wrapper for coding agents, and the need for a more robust, collaborative, and secure agent development framework. The team realized that the problems faced by coding agents and custom data science agents were fundamentally the same, leading to the development of a platform that abstracts the underlying agent harness and provides a consistent interface.

Open sourcing Omnigents to drive ecosystem growth.

Databricks has chosen to open-source Omnigents, a strategic decision based on fostering a network effect. Similar to their experience with Spark, they believe that an open standard for agent frameworks will benefit from widespread community adoption and contribution. By making Omnigents open source, Databricks aims to provide a foundation for anyone building agents, encouraging customization and integration. This approach allows other teams and companies to build upon the framework, adding connectors, cloud sandboxes, and new agent harnesses. Early adoption has already seen significant community contributions, including support for Kubernetes and various cloud sandboxes, demonstrating the potential for rapid ecosystem expansion.

Stateful policies for enhanced security and cost control in agents.

A key challenge highlighted with agents is balancing usability with security and cost. Traditional security models using simple allow/disallow lists for tools or actions are insufficient. Omnigents introduces "stateful or contextual policies" that track the session's history and state. For instance, an agent might be allowed to install new packages from NPM, but if it installs a newly released, unverified package, subsequent actions requiring higher privilege might be blocked. Similarly, if an agent reads a large number of confidential documents, it could be flagged for risky behavior. This dynamic policy enforcement, based on the agent's actions within a session, provides a more nuanced and effective security layer. Furthermore, this stateful approach allows for granular cost control, enabling users to set spending caps for specific agent tasks and receive notifications or require approval for exceeding them. This move towards intelligent, context-aware security and spending management is crucial for enterprise adoption of AI agents.

L-TAP: Unifying OLTP and OLAP for modern data workloads.

Databricks is also pushing the boundaries of database technology with L-TAP, a new approach to unify Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) workloads. Traditional databases are split into these two categories, with OLTP handling row-level transactions and OLAP dealing with large-scale data analysis. The industry has long sought a single database engine that can efficiently handle both, but past attempts have often led to compromises. L-TAP focuses on unifying the storage layer, proposing that if data written in a row-oriented format (ideal for OLTP) can also be efficiently read in a column-oriented format (ideal for OLAP), then a single storage layer can serve both. This eliminates the need for complex and brittle Change Data Capture (CDC) pipelines, making data immediately available for analytics. The inspiration for L-TAP comes from the observation that existing analytics databases are often a decade old and have evolved through "hacks" to support new use cases. L-TAP aims to provide the benefits of HTAP (Hybrid Transactional/Analytical Processing) by having a single storage layer that offers real-time analytics without performance degradation on transactional workloads.

The "Dream Engine": A ground-up redesign of database architecture.

Databricks is undertaking an ambitious project called the "Dream Engine," a complete rewrite of their database engine from scratch. Recognizing that existing database technologies are aging and often burdened by years of accumulated compromises, they are taking a fresh approach. The "second system effect," where a second, more ambitious project fails due to overreach, is a known risk, but Databricks has assembled a team with extensive experience. Their novel approach involves building a "factory" for database engines. This factory utilizes machine learning models trained on trillions of data points from past query traces. These models can predict, with high fidelity, how different algorithms and data structures will perform for various query types and data distributions. This allows the factory to dynamically select the optimal algorithms and data structures at both implementation and runtime, ensuring high performance across diverse workloads, including low-latency transactions and petabyte-scale analytics. This data-driven, ML-powered approach is intended to overcome the limitations of traditional database design, which often relies on academic papers and manual tuning.

Mosaic and Databricks' AI model strategy: Focus on utility over frontier models.

Databricks' acquisition of Mosaic signals a strategic direction in AI, emphasizing the practical application of models rather than solely focusing on training the largest, most advanced "frontier" models. While they have released open-source models like DBRX, their strategy is to make existing and future models more useful. A key application is automating data querying, exemplified by their "Genie" agent, a virtual data scientist. Instead of solely focusing on building the next state-of-the-art LLM, Databricks is concentrating on specialized, cost-effective models for specific tasks, such as document parsing, which can be significantly cheaper and more accurate than general-purpose models. They are also developing specialized sub-agents for coding tasks, leveraging concepts like "advisor models." This approach allows them to build systems that automate complex processes, making AI more accessible and efficient for a wider range of use cases, including interacting with enterprise data.

Data and context as the "new oil" in the AI era.

The discussion reinforces the idea that data, coupled with effective AI technologies, represents the "new oil." As technology advances, the value derived from domain-specific data increases. AI agents can now automate tasks and provide insights that were previously impossible. Databricks' own experience with database query traces and table structures allows them to build new, performant engines confidently. The ease of model customization is expected to grow, enabling businesses to leverage their unique data assets more effectively. This trend suggests a future where proprietary data, combined with increasingly sophisticated AI, provides a significant competitive advantage, moving beyond generic model capabilities to domain-specific intelligence. The core thesis is that once data is in the right place, AI agents can unlock immense value, whether for security, marketing, or general business operations.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Databricks AI Cloud Strategy: Key Takeaways

Practical takeaways from this episode

Do This

Focus on getting data in the right place to leverage powerful AI models.

Embrace open-source platforms and formats to foster network effects.

Build and iterate incrementally, focusing on tight loops with target customers.

Prioritize specialized models for high-volume, niche use cases.

Develop unified platforms for data management, security, and AI governance.

Avoid This

Don't rely solely on traditional software paradigms; adapt to new AI-driven approaches.

Avoid proprietary formats that can lead to vendor lock-in.

Don't try to boil the ocean; start with core functionalities and iterate.

Don't solely focus on training frontier models; prioritize making models useful.

Don't assume a one-size-fits-all approach for different company types (tech vs. enterprise).

Common Questions

Databricks is betting on the 'Agent Cloud,' where AI models become powerful enough to process data and essentially rewrite traditional software paradigms. The key is to ensure data is in the right place, allowing agents to create magic.

Topics

AI & Machine Learning Technology & Innovation Business & Entrepreneurship Cloud Computing Data Governance Developer Tools Database Architecture AI Platforms LLM Customization Agent Cloud Open Source Strategy

Mentioned in this video

Companies

Databricks

A company focused on data analytics, machine learning, and AI infrastructure, discussing their product launches and strategic direction.

Oracle

A multinational technology corporation, mentioned as an example of a legacy system in traditional enterprises and a past instance of vendor lock-in.

SingleStore

A distributed SQL database for real-time analytics, mentioned as a company that believed in a single database handling both transactional and analytical workloads.

Neon

A serverless PostgreSQL company, mentioned for its architecture with separation of compute and storage, and its past work on sandboxing solutions.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free