What major lessons did the speaker learn about sales and customer understanding at Impira?

The speaker learned that successful sales involve far more than just technical prowess, including the difficulty of getting meetings and the mechanics of closing deals. More importantly, he realized the necessity of a deep, intuitive understanding of the customer's needs and context, especially when selling to non-technical users, which was a significant humbling lesson.

Why was Impira acquired by Figma, and what insights did the speaker gain about AI's rapid evolution?

Impira was acquired by Figma as the speaker realized that rapidly advancing text-based LLMs like BERT and GPT-3 were cannibalizing their computer vision-focused document extraction technology. This pivotal moment underscored the volatile nature of AI and the importance of adapting quickly to technological shifts.

What is Brain Trust's core thesis and how has its product evolved since its inception?

Brain Trust's core thesis is that embracing evaluation as a central workflow in AI engineering leads to better AI software. The product started as an evaluation tool, evolved into a debugger, and is now becoming an IDE-like playground, enabling end-to-end development of AI products.

How does Brain Trust simplify the process of running evaluations and integrating tools for AI development?

Brain Trust simplifies evals by promoting a declarative syntax instead of complex for-loops, making them faster and easier to write. It also allows users to define custom tools (like search APIs) with simple TypeScript code, accessible via a REST API, tightly integrating development with evaluation.

What strategic bets did Brain Trust make that contributed to its success in a crowded market?

Brain Trust made three key strategic bets: a hybrid on-prem deployment model to meet customer data residency needs, prioritizing TypeScript SDK for product builders, and initially focusing on evaluations as the most pressing pain point in LLM Ops, trusting other functionalities would follow.

Why does the speaker believe fine-tuning is not a sustainable business outcome for AI companies?

The speaker argues that automatic optimization is the true business outcome, and fine-tuning is merely one of several technical means to achieve it. He contends that being fixated on fine-tuning makes a business vulnerable to rapid technological shifts, as other methods like in-context learning or prompt optimization may prove more effective or cheaper in different contexts.

What is the current market share distribution between OpenAI, Anthropic, and other models in production, according to Brain Trust's observations?

Prior to Claude 3, OpenAI held close to 100% market share. Post-Claude 3, Anthropic's Haiku and Sonnet gained significant traction due to their cost-effectiveness, speed, and tool-calling support, with Haiku making an interesting foothold. While OpenAI still holds an overwhelming majority in production, Anthropic is often used for side projects and prototypes, and every new project now evaluates both.

What is the speaker's perspective on the future of AI and the role of 'agentic frameworks'?

The speaker believes that current agentic frameworks, which involve complex control flow and logic, will increasingly be pushed directly into the LLM itself. He argues that models will get better at reasoning and agentic tasks natively, making elaborate external frameworks less necessary for building durable AI systems.

How do companies currently use Generative AI in production, based on Brain Trust's data?

Approximately 50% of use cases involve 'single prompt manipulations' like summaries or auto-generated titles for linear tickets, adding 'delight' to software. About 25% are 'simple agents' (prompt + tools, often RAG-based chatbots). The remaining 25% are 'advanced agents' with loops or longer runtimes.

What is the 'code core versus LLM core' concept in AI engineering?

This concept suggests that while LLMs provide intelligence, the core of an AI system should remain code-centric ('code core') rather than being entirely driven by complex, black-box agents ('LLM core'). The idea is to sprinkle LLMs into an existing code base, making development easier to debug, scale, and iterate, similar to the functional core imperative shell pattern in systems engineering.

What makes Alana, the speaker's wife, particularly effective in the startup ecosystem?

Alana's effectiveness stems from her genuine care for people, proactively building relationships and offering support before any transactional interest. She also possesses a unique hybrid skill set as a technical product manager, designer, and engineer, building her own software stack to solve inefficiencies.

Key Moments

Production AI Engineering starts with Evals

Latent Space Podcast

Science & Technology4 min read117 min video

Oct 11, 2024|2,996 views|63|1

Save to Pod

Key Moments

TL;DR

BrainTrust's co-founder discusses evaluating AI production, transitioning from databases to AI engineering, and the evolution of AI tools.

Key Insights

Production AI engineering must prioritize evaluation (evals) as a core workflow for driving improvements and decision-making.

The evolution of AI, particularly through transformers and LLMs like GPT-3/4, has enabled traditional software engineers to participate more directly in AI development.

BrainTrust aims to empower product builders and software engineers with AI tools, focusing on user experience and simplifying complex AI workflows.

The AI market is rapidly evolving, with a shift from purely technical solutions to solving business problems, exemplified by the move from fine-tuning to broader automatic optimization.

While open-source models have potential, the current production landscape favors reliable, scalable, and easily accessible models, often via APIs, due to operational complexities.

The future of AI development involves integrating intelligence seamlessly into applications, rather than building complex agentic systems, making code simpler and more user-centric.

THE FOUNDATIONAL SHIFT: EVALUATION IN AI PRODUCTION

The core idea behind BrainTrust, as explained by its co-founder, is revolutionizing AI production through a strong emphasis on evaluation. The speaker recounts an experience at Impira where implementing an evaluation system dramatically sped up model improvement. This highlights the bottleneck that arises when discussions about model choices are purely hypothetical or based on limited examples. Evals provide a scientific framework to measure progress, identify regressions, and guide development, fundamentally changing how AI applications are iterated upon and improved.

FROM DATABASES TO AI: A CAREER EVOLUTION

The journey from working at Single Store, a HTAP database company, through Impira, an AI unstructured data company, to co-founding BrainTrust, showcases a deep understanding of complex systems and evolving technological landscapes. Early career roles at Microsoft and research, while providing foundational knowledge, lacked the desired impact and creativity. The speaker's experience with Single Store revealed the trade-offs between advanced technology and market accessibility, a lesson that informed subsequent ventures. Impira, though technically innovative, highlighted the difficulty of selling technical solutions without deep customer empathy, especially when targeting line-of-business users, and reinforced the importance of sales and business acumen alongside technical expertise.

THE IMPIRA ACQUISITION AND THE AI PARADIGM SHIFT

The acquisition of Impira by Figma was catalyzed by a rapid technological shift. Impira's initial strength lay in computer vision-based document extraction, requiring extensive data examples. However, the emergence of transformer models like BERT and, critically, GPT-3 and its successors, fundamentally changed the game. The speaker's personal experimentation with these models revealed their power in understanding text and context, quickly cannibalizing previous approaches. This realization led to a strategic pivot to leverage these new models, but ultimately, the founder recognized that the core problem of unstructured data transformation was becoming commoditized, prompting the pursuit of an acquisition to find a new, more impactful direction.

BRAINTRUST: EMPOWERING AI ENGINEERS WITH A DEVELOPER-FIRST APPROACH

BrainTrust emerged from the need for better tools tailored to software engineers entering the AI space. The speaker observed that traditional ML evaluation tools were often inaccessible to software engineers, creating a divide. BrainTrust's platform bridges this gap by offering an end-to-end developer platform that integrates evaluation, data collection, and prompt management. The platform's evolution from an evaluation tool to a debugger and now an IDE-like experience reflects its user-driven development. Key features like the durable, collaborative playground, automatic data ETL via logging, and the ability to define custom tools empower developers to build and iterate on AI products more efficiently.

THE EVOLVING AI MARKET: BEYOND FINE-TUNING AND TOWARDS GENERAL INTELLIGENCE

The discussion touches upon market trends, including the declining use of fine-tuning in production, despite its technical validity. The emphasis is shifting towards automatic optimization as a business goal, achievable through various methods like prompt engineering and in-context learning. The speaker posits that agentic frameworks, while currently popular, might be a temporary workaround for LLMs' current reasoning limitations, predicting that future models will integrate more complex logic directly. This perspective suggests a future where AI capabilities are increasingly embedded within foundational models, simplifying the architecture of AI applications.

THE ROLE OF INFRASTRUCTURE AND MARKET DYNAMICS

BrainTrust's strategic bets, such as its hybrid on-premise model and prioritization of TypeScript, highlight a pragmatic approach to serving a demanding market. The speaker argues that despite market skepticism, these choices have enabled deeper customer integration and resonated with product builders. The conversation also delves into the GPU inference market, suggesting that while margins can be high, availability and reliability are critical differentiators, favoring established players like OpenAI. The panel discusses the evolving landscape of model providers (OpenAI, Anthropic, Meta) and the complexities of integrating them, noting that ease of use and consistent availability remain paramount for production AI.

THE FUTURE OF AI WORKLOADS: SPRINKLING INTELLIGENCE EVERYWHERE

The prevailing trend observed in production AI workloads is the shift towards 'sprinkling intelligence' throughout applications rather than building monolithic agentic systems. This approach involves embedding discreet AI calls for tasks like summarization or data generation within existing software. BrainTrust's platform supports this paradigm by making it easy to integrate AI capabilities, from simple prompt manipulations to more complex agents. The focus is on enhancing user experience and developer productivity by making AI features easily accessible and usable, moving towards a future where building intelligent software is as straightforward as building traditional software.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Concepts

●People Referenced

Common Questions

Impira was founded on the idea of making unstructured data as easy to use as structured data, leveraging advancements in deep learning models like AlexNet. The speaker aimed to tackle the challenges of data extraction that were previously impossible.

Topics

AI & Machine Learning Programming & Software Business & Entrepreneurship Product Development Cloud Computing AI Engineering LLM Evaluation Agentic AI Startup Acquisition Database Technology TypeScript Development AI Market Trends

Mentioned in this video

Companies

Brain Trust

The speaker's current company, an end-to-end developer platform for building AI products, centered on an evaluation-driven workflow. It evolved from an eval tool to a debugger and eventually an IDE-like playground.

Palo Alto Networks

A cybersecurity company where Alena's father was president before joining Cloudflare, managing billions in revenue.

Microsoft

The speaker's first internship, working on Bing's distributed compute infrastructure. The experience was impactful but lacked intense creativity and room for interesting work.

Redshift

Amazon's cloud data warehouse, noted for having a feature similar to Snowflake's variant type called 'Super'.

Cloudflare

A web infrastructure and website security company where Alena's father is currently the president.

Hugging Face

A platform for building, training, and deploying machine learning models. The speaker became a top non-employee contributor, working on document QA models.

SingleStore

A leading HTAP (Hybrid Transactional/Analytical Processing) database. The speaker was its first VP of Engineering and discussed its advanced technology but also its high cost and niche market suitability, comparing its evolution to Neon for wider adoption.

Stitch Fix

An online personal styling service, mentioned as a company Impira tried to close a deal with early on.

Impira

The speaker's first AI company, founded on the idea of making unstructured data as easy to use as structured data, leveraging ML models. They learned critical business lessons about sales, customer empathy, and market fit.

Adobe

Mentioned in the context of Figma's acquisition by Adobe, a factor contributing to Figma's sense of stability.

Humanloop

One of the early movers in the AI tooling space, offering durable playgrounds, prompt saving, and eval features, but predating Brain Trust's focus on engineering efficiency and declarative evals.

Cruise

A self-driving car company, where Eden, Brain Trust's Head of Product, was a designer.

Figma

A design tool company that acquired Impira. The speaker was there for eight months and discussed the challenges of integrating AI, especially visual AI, into a high-quality product with an annual release cycle.

Neon

A company founded by Nikita Shamgunov, aiming to provide a hyper-inexpensive PostgreSQL offering with a world's best free tier, contrasting with SingleStore's high-cost model.

Brex

A customer of Brain Trust, whose engineers expressed a desire for Brain Trust's playground to become their IDE.

Databricks

A data and AI company with a famous hybrid on-prem model. While their model is successful, it's often viewed with mixed perspectives, serving as a cautionary tale/inspiration for Brain Trust's hybrid approach.

DataDog

A monitoring and analytics platform that Figma used, mentioned as a comparison point for observability solutions.

Google

Mentioned as a public cloud provider alongside Amazon and Azure, suggesting big companies have special relationships for their AI models.

Snowflake

A cloud-based data warehousing company, praised for its best-in-class implementation of semi-structured data with its 'variant type' but criticized for its expensive packaging and high minimum query time.

OpenAI

A leading AI research and deployment company. Brain Trust leveraged their API for LLM judging and tool interactions, and OpenAI's models are heavily adopted by Brain Trust customers due to reliability and availability.

Firebase

A platform for developing mobile and web applications. The interviewee's analogy for Brain Trust is similar to Firebase for traditional software developers, providing an end-to-end platform.

Pinecone

A vector database company, mentioned as a company that the speaker would host on their podcast to hear an opposing view on vector databases.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free