[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify
Key Moments
AI startup funding is booming with large seed rounds, but focus is shifting to memory, personalization, and practical applications over pure hype.
Key Insights
The merger of DBT and Fivetran signifies an acceleration toward IPO rather than the end of the modern data stack.
Data cataloging struggled as a standalone category because existing tools offered sufficient features for human users.
The funding environment in 2025 saw massive seed rounds ($100M+) for AI startups, often with unclear near-term roadmaps.
Memory management and continual learning are crucial for AI application retention and personalization, addressing user churn.
RL environments are potentially a fad, with the real world and existing applications serving as more effective learning grounds.
The most exciting startups solve hard technical problems to deliver significantly improved user experiences or enable new capabilities.
THE EVOLUTION OF DATA STACKS AND AI INTEGRATION
The conversation begins by addressing the perceived 'end' of the modern data stack following the DBT and Fivetran merger. This is reframed not as a decline, but as a strategic move by two healthy, growing companies to accelerate their path to IPO, requiring larger revenue thresholds. Notably, frontier AI labs are leveraging these data tools, highlighting the symbiotic relationship between data management and AI development. The demand for data analytics tools remains strong, even if the roles of data scientists and analytics engineers have evolved.
CHALLENGES AND OPPORTUNITIES IN DATA CATALOGING
A key insight reveals that data cataloging, once predicted to be a significant category, struggled to gain traction as a standalone product. This was largely because existing platforms like Snowflake and DBT incorporated sufficient data cataloging features for human users. The speaker posits a potential missed opportunity: building data catalogs not for humans, but for machines, akin to metadata services for agents or microservices. The focus may have been misplaced on discoverability rather than the more critical aspect of governance.
INFRASTRUCTURE SCALABILITY AND DATA LOADING EFFICIENCY
The scaling of existing data infrastructure to meet AI use cases has been surprisingly elegant, though the sheer scale of AI companies is immense. Efficient data loading, especially to GPUs, is critical to avoid idle compute costs. Companies are developing specialized file formats, like 'Vortex' from Spiral, to optimize this process. While current transactional databases are often sufficient, the potential explosion in transactions from widespread agent use could necessitate future database paradigm shifts.
THE DYNAMIC AND ANXIOUS FUNDING LANDSCAPE
The 2025 funding environment is characterized by unusually large seed rounds, sometimes exceeding $100 million, often granted with long-term visions but without clearly detailed near-term roadmaps. This trend causes investor anxiety, as responsible fundraising necessitates defining milestones and required resources for the next 12-24 months. Founders are encouraged to articulate concrete plans, not just grand ambitions. The pressure to raise at high valuations also impacts hiring, as candidates are attracted to 'unicorn' status and substantial equity offers.
THE CRITICAL ROLE OF MEMORY AND CONTINUAL LEARNING
Memory management and continual learning are identified as major themes with significant market potential, particularly for AI applications. These capabilities are crucial for improving user retention and personalization, combating the high churn rates often seen in rapidly growing AI apps. Implementing effective memory goes beyond basic user recall; it involves learning new skills and adapting to the dynamic nature of the world. This presents complex systems and infrastructure challenges, involving stateful inference and dynamic weight updates.
WORLD MODELS AND THE REALITY OF RL ENVIRONMENTS
The concept of 'world models' is met with skepticism, partly due to a lack of clear definition and generalization issues across use cases. While applications in video editing, autonomous driving, and coding are emerging, their efficacy depends heavily on specific definitions. Similarly, Reinforcement Learning (RL) environments are viewed by some as a potential fad. The argument is made that the most effective RL environment is the real world, or simulations based on logs and traces from actual applications, rather than mere app clones.
IDENTIFYING THE MOST EXCITING STARTUP ARCHETYPES
The most compelling startup investments often lie in infrastructure, tools, and platforms that address evolving needs, such as those in continual learning. However, the most exciting companies often emerge when they tackle hard research and engineering problems as a prerequisite to delivering a significantly improved user experience or enabling entirely new capabilities. This principle applies from application-level advancements like Retrieval-Augmented Generation (RAG) to foundational challenges like rule-following for customer support.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
Common Questions
Data infrastructure tools like DBT and Fivetran are increasingly crucial for AI, extending beyond traditional analytics to manage and prepare training datasets for LLMs and other AI models.
Topics
Mentioned in this video
Mentioned as a potential user of Antithesis's technology.
A company that recently emerged from stealth, raising a $100 million seed round led by James Street for AI testing and deterministic simulation.
A file format developed by Spiral, designed to make data loading efficient, particularly for GPUs.
An AI application company that users might switch to, highlighting the importance of user retention and personalization.
An enterprise AI adoption tool that would benefit from personalization and model learning similar to consumer applications.
A portfolio company that has developed a file format called Vortex to make data loading efficient, specifically for GPUs.
Led a $100 million seed round for AI testing company Antithesis.
A company in the data catalog space that, along with others, has struggled as a standalone category.
A company where DBT was already an important part of their stack for managing training datasets within weeks of its formation.
A company mentioned in the context of the struggles of the data catalog category.
An example of a company that requires significant funding (over $100 million) due to the costs associated with building out a high-throughput wet lab for biology.
More from Latent Space
View all 62 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free