Key Moments
AI Dev 26 x SF | Manos Koukoumidis & Stefan Webb: VibeML: Build your AI model in hours, not months
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
Enterprises are shifting from renting generic AI to owning specialized models, but building them previously took months. VibeML aims to automate this, enabling engineers to create custom AI in hours.
Key Insights
Enterprises are moving from consuming generic AI APIs (like OpenAI, Anthropic, Google Gemini) to building and owning their specialized AI models, a shift that began gaining traction in late 2025.
Specialized AI models offer dramatically higher quality, lower costs (potentially 10-100x smaller and more efficient), improved privacy, and greater control compared to generic models.
Building custom AI models has historically been prohibitively difficult, requiring months of effort and deep AI expertise for each use case, leading to a desire for a 'model factory'.
VibeML is presented as a 'model factory' that enables engineers to build specialized AI models automatically from a prompt, with the process taking hours instead of months.
A healthcare provider using VibeML technology saw a 20% quality improvement and a 70% cost reduction by building a custom model for extracting information from medical records.
A study utilizing VibeML technology to evaluate Google AI overviews found that only 39% of claims were fully supported by the provided sources, highlighting a significant hallucination rate.
The enterprise AI landscape is rapidly evolving
The predominant trend in enterprise AI is a significant shift from renting generic intelligence to owning specialized intelligence. Following the initial wave of Generative AI in late 2022, enterprises largely relied on consuming APIs from major providers like OpenAI, Anthropic, and Google. However, starting in late 2025, a faster-than-ever adoption of building proprietary, specialized models has emerged. This move is driven by several key benefits: significantly improved quality tailored to specific use cases, dramatically lower operational costs and latency, enhanced privacy and security by controlling data deployment, complete control over AI roadmaps, and the ability to build unique market differentiation and durable competitive advantages.
Why specialized models outperform generic ones
Generic AI models, built to perform reasonably well across a wide array of tasks, are inherently optimized for nothing specific. This means they are not ideal for niche enterprise use cases. In contrast, a specialized model is fine-tuned for a particular application, leading to superior performance. For instance, Intercom reported building a model better than GPT-3 for code generation that was five times cheaper to operate. Similarly, Kensho announced a coding agent model superior to GPT-3 and ten times cheaper to run. These specialized models can be 10 to 100 times smaller and more efficient than their generic counterparts, resulting in substantial cost savings and performance gains. Furthermore, owning a custom model provides enterprises with control over their data and infrastructure, preventing reliance on third-party roadmaps and terms of service.
The historical challenges of building custom AI
Despite the clear advantages, building specialized AI models has been a significant hurdle for most enterprises. The primary reasons are the immense effort and time required. Organizations have reported that developing a single specialized model for one use case could take months, a process that would need to be repeated for every new model or update. This pace is unsustainable, especially when coupled with the challenge of maintaining and improving models in production. Many companies also admit to lacking sufficient in-house AI expertise or the personnel to dedicate to such resource-intensive projects. This created a need for a solution that could abstract away the complexity and accelerate the development cycle.
VibeML: Automating specialized AI development
VibeML is presented as the solution to these challenges, functioning as an automated 'model factory'. It aims to enable any engineer, regardless of deep AI expertise, to build specialized AI models quickly, ideally within hours rather than months. The platform guides users through a structured process starting with defining the task, much like prompting a general AI, but with the goal of building a custom AI model. For example, a prompt like 'Build a model to summarize key news items in news articles in bullet point format' initiates the process. VibeML then devises a plan for building this model, which includes defining evaluators (metrics for success), synthesizing test data, evaluating baseline models, creating training data, fine-tuning the model, and then re-evaluating. The platform handles much of this complexity, allowing users to focus on defining their needs.
The VibeML workflow and agent-driven process
The VibeML platform uses an agentic approach to guide users through model development. After an initial task definition, the agent proposes a plan detailing steps like setting up evaluators, synthesizing data, and fine-tuning. For instance, when building the news summarizer, the agent suggested evaluators like completeness, conciseness, and format adherence, and the user could add more, such as 'faithfulness' to prevent hallucinations. The platform can also synthesize training data if custom data isn't provided. It suggests baseline models and evaluates them, providing scores across the defined metrics. A key feature is the 'review failure modes' tool, which analyzes where the model performed poorly and allows for the generation of targeted training data to improve those specific areas. The process then moves to fine-tuning, offering options like LoRA (Low-Rank Adaptation) for parameter-efficient tuning. Finally, the platform quantifies the improvements post-fine-tuning, allowing for iterative cycles of improvement until desired metrics are met. The ultimate goal is to create a deployable model that offers higher quality and lower cost than generic alternatives.
Real-world impact and enterprise adoption
Companies are already leveraging VibeML technology to achieve significant results. A leading healthcare provider used it to build a custom agent for extracting medical record information, achieving a 20% quality improvement and a 70% cost reduction compared to general-purpose AI models which suffered from high latency, cost, and low accuracy. In another instance, a global media giant, The New York Times, utilized VibeML's technology to conduct a study evaluating the faithfulness of Google AI overviews. This partnership resulted in a published article that highlighted a surprising finding: only 39% of claims made in Gemini 3 overviews were fully supported by their cited sources, indicating a substantial hallucination rate that VibeML's technology helped identify at scale. Other leading enterprises like Microsoft, HP, and IBM are also adopting VibeML.
VibeML's impact on differentiation and IP
The ability to rapidly build and iterate on specialized AI models is crucial for enterprise differentiation. By owning proprietary models that are continuously improved in production, companies can build compounding intellectual property (IP) and establish a competitive mode that generic models cannot match. This proactive ownership of intelligence, rather than reactive prompting of rented services, is positioned as the key differentiator for future enterprise success. VibeML empowers this by allowing engineers to create models that are not only better and cheaper for specific tasks but also evolve over time, outpacing competitors and generic AI advancements. This control and continuous improvement loop are vital for long-term market leadership.
Accessibility through open-source and enterprise platforms
VibeML offers both an enterprise platform and an open-source library, making its technology accessible to a broader audience. The open-source library, available on GitHub, has garnered significant traction with over 9,000 stars and active community contributions. The enterprise platform, launched recently, has seen rapid adoption with approximately 2,000 signups since its launch. This dual approach allows organizations to explore the capabilities of automated AI development and then scale to enterprise-grade solutions. Examples like fine-tuning a customer support model that outperforms top-tier LLMs from Anthropic (Opus, Sonnet, Haiku) with a much smaller, 0.8 billion parameter model, achieving 100x faster speeds and lower costs, illustrate the practical power of this technology for creating highly efficient, specialized AI applications.
Mentioned in This Episode
●Software & Apps
●Companies
Building Custom AI Models with Umei: Dos and Don'ts
Practical takeaways from this episode
Do This
Avoid This
Comparison of Custom vs. Generic AI Performance
Data extracted from this episode
| Metric | Generic AI (Example) | Custom AI (Umei Example) |
|---|---|---|
| Quality/Accuracy | Lower (optimized for nothing) | Higher (optimized for an enterprise use case) |
| Cost | Higher | Lower (10x-100x more efficient) |
| Latency | Higher | Lower |
| Development Time | Months per model | Hours (with Umei) |
| Differentiation/Mode | None (competitors can use same tools) | Compound IP and competitive advantage |
Healthcare Provider AI Improvement with Umei
Data extracted from this episode
| Metric | Performance Change |
|---|---|
| Quality | +20% |
| Cost | -70% |
AI Model Comparison for Supporting Customer Queries
Data extracted from this episode
| Model | Accuracy | Parameters | Speed/Cost |
|---|---|---|---|
| Umei Fine-tuned Model | Higher than Opus, Sonnet, Haiku | 0.8 Billion | 2 orders of magnitude faster and cheaper |
| Opus (Anthropic) | Baseline for comparison | N/A | N/A |
| Sonnet (Anthropic) | Baseline for comparison | N/A | N/A |
| Haiku (Anthropic) | Baseline for comparison | N/A | N/A |
Model Performance in Calculating Hallucinations
Data extracted from this episode
| Model | Performance |
|---|---|
| Custom AI Model (Umei Technology) | Outperforms GPT-5.2 and Opus for this specific task |
| GPT-5.2 | Outperformed by custom AI |
| Opus | Outperformed by custom AI |
Google AI Overviews Faithfulness Study (Gemini 3)
Data extracted from this episode
| Metric | Result |
|---|---|
| Claims fully supported by sources | 39% |
Common Questions
Enterprises are moving towards owning specialized AI models to gain higher quality, lower costs, improved latency, better privacy and security, full control over their tools, and to build unique competitive advantages that generic, rented models cannot offer.
Topics
Mentioned in this video
Mentioned as a leading enterprise adopting Umei's technology.
A company providing generic AI APIs that enterprises are moving away from in favor of specialized, owned models.
A provider of generic AI APIs, similar to OpenAI, which enterprises are shifting away from.
A company that built a specialized coding agent model, claiming it was better and 10 times cheaper than cloud models.
A company that developed its own specialized model, stating it was better than GPT-4 code and five times cheaper to operate.
Mentioned as a leading enterprise adopting Umei's technology.
Mentioned as a leading enterprise adopting Umei's technology.
The platform where the open-source library for Umei is hosted, highlighted for its significant traction and community contributions.
A Google AI model that enterprises have been renting, but are now building specialized alternatives to.
A generic AI model mentioned as an example of what enterprises are moving beyond by building their own specialized models.
A 'model factory' platform that enables enterprises to efficiently build and deploy specialized AI models.
Mentioned as a platform for devising plans for complex software, analogous to how Umei devises plans for AI models.
The version of Google's AI overviews that a study found only 39% of claims were fully supported by sources.
A model that a custom AI model built with Umei's technology outperformed in calculating hallucinations.
More from DeepLearningAI
View all 98 summaries
27 minAI Dev 26 x SF | Diamond Bishop: The Next 100 Agents. Building the Agent Native Office
22 minAI Dev 26 x SF | Andrew K. Davies: Deterministic Memory: How to Build an AI That Cannot Lie
25 minAI Dev 26 x SF | Ara Khan: Evals Are Broken Use Them Anyway
26 minAI Dev 26 x SF | João Moura: Building Recurring, Governed, and Embedded Enterprise Workflows
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free