AI Dev 25 x NYC | Eric Sondhi: Small AI: The Next Big Thing
Key Moments
Small AI, running locally on devices, offers an accessible, cost-effective alternative to large, centralized AI models.
Key Insights
A fork in the road for AI development: 'Big AI' (large, centralized, costly models) versus 'Small AI' (local, efficient, open-source models).
Small AI addresses limitations of Big AI, including high costs, latency, vendor lock-in, and lack of transparency.
Open-source innovation and frameworks like PyTorch have fueled the rise of accessible and increasingly powerful small AI models.
Agentic AI, which reasons and acts autonomously, can be more scalable and economically viable with small AI due to reduced reasoning costs and latency.
Developers can leverage domain expertise to fine-tune small AI models for specialized use cases, creating a competitive advantage.
ARM's hardware, including laptops and mobile devices, is already capable of running significant small AI models locally, with future hardware optimizations to further accelerate performance.
The future of AI is 'everywhere,' with small AI becoming a default part of application stacks, enabling ubiquitous AI experiences across devices.
THE DUAL PATHWAYS OF AI DEVELOPMENT
The AI landscape is bifurcating into two distinct paths: 'Big AI' and 'Small AI'. Big AI represents large, centralized, proprietary models developed by a few tech giants, often accessed via cloud services. In contrast, Small AI encompasses open-source, efficient models designed to run directly on user devices, such as laptops and smartphones. This distinction has critical implications for accessibility, cost, and innovation in AI development and deployment.
DRAWBACKS OF CENTRALIZED 'BIG AI'
Large, hosted AI models present several challenges. The most apparent is the prohibitive cost, especially at enterprise scale, hindering startups and rapid development. Beyond cost, latency issues arise, requiring constant connectivity. Furthermore, closed ecosystems create vendor lock-in, limited transparency regarding data usage, and potential quality compromises when a specialized model is needed for a specific task rather than a general-purpose one.
THE EMERGENCE AND ADVANTAGES OF 'SMALL AI'
The growth of open-source frameworks like PyTorch and TensorFlow has democratized AI, enabling the creation of increasingly capable yet smaller models. Small AI offers significant advantages, including local execution without internet dependency, leading to faster response times and democratized experimentation. Its community-driven evolution allows for rapid iteration cycles powered by global collaboration, making it highly adaptable.
ENABLING SCALABLE AGENTIC WORKFLOWS
Agentic AI systems, capable of autonomous reasoning and task execution, can benefit immensely from Small AI. Large models struggle with deep reasoning pipelines due to high computational, environmental, and financial costs. Small AI provides an economically viable alternative, drastically reducing the cost of each reasoning step and improving latency by orders of magnitude. This makes complex, multi-step reasoning more scalable and accessible.
REDUCING IMPLEMENTATION COSTS AND FAILURE RATES
The high implementation costs associated with Big AI contribute to a significant failure rate in AI projects, with studies indicating that up to 90% fail to produce meaningful revenue or business impact. Small AI enables a 'fail fast' philosophy, allowing developers to prototype, iterate, and adapt more quickly and cost-effectively. This approach helps identify what isn't working early on, preventing the sinking of excessive resources and leading to solutions that deliver practical, real-world value.
DEVELOPER SKILLS AND VERTICAL SPECIALIZATION
The shift towards Small AI fosters new developer skills, moving beyond prompt engineering to specialized fine-tuning. Professionals can leverage their domain expertise, even without deep development backgrounds, to customize models for specific use cases. This vertical specialization creates a competitive advantage, allowing for the deployment of AI solutions directly where users are—on edge devices, in IoT, or on mobile phones – making AI solutions highly differentiating.
HARDWARE READINESS AND FUTURE ACCELERATION
Current ARM-based hardware, including laptops and edge devices, already possesses the capability to run sophisticated small AI models locally. Future ARM architectures, featuring scalable matrix extensions and NPUs, will further accelerate AI performance, making local execution even faster and more efficient. This ongoing hardware evolution promises to make Small AI adoption more practical, performant, and widely available across a diverse range of devices.
THE 'AI EVERYWHERE' PARADIGM
The convergence of capable local hardware and accessible Small AI models is paving the way for an 'AI Everywhere' paradigm. This envisions applications with cascading AI models running at various levels of proximity—from large cloud-based models to those on laptops, mobile devices, and even tiny IoT devices. Continuity of experience will be maintained as these models and developer workflows evolve, powered by increasingly performant hardware.
CALL TO ACTION: EMBRACE SMALL AI AND SPECIALIZATION
Developers are encouraged to start small, iterate quickly, and explore the capabilities of Small AI by integrating and fine-tuning models for specific value. Building with Small AI, rather than solely relying on giant cloud providers, offers a path to create unique competitive edges through specialization. The future of AI is not just about size, but about being smarter, faster, and closer to the user—in their pocket and on their devices. ARM offers resources and an ecosystem to support this journey.
MINIMUM HARDWARE REQUIREMENTS FOR USEFUL AI
Even microcontrollers are now capable of running pared-down Tiny ML models for tasks like storytelling or basic facial recognition. These small, dedicated models can add interactivity to toys or board games. Further up, embedded systems can perform object detection and user authentication. This demonstrates a broad spectrum of applicability for AI, from the smallest devices to more complex edge computing scenarios, with continuous improvements in model efficiency and hardware capabilities.
THE ROLE OF CPU AND ACCELERATED COMPUTE
While CPUs have limitations for massive foundation models, their capabilities are often underestimated for running a vast range of smaller AI models. Awareness of these models and their efficient execution on CPUs, especially when complemented by accelerated compute platforms like GPUs or NPUs, is key. Optimizing complementary inference parts like RAG (Retrieval-Augmented Generation) on the CPU can significantly enhance performance, making hybrid approaches increasingly viable.
ARM'S EVOLVING HARDWARE AND FUTURE PROSPECTS
ARM designs GPUs for power efficiency, particularly in mobile, incorporating neural technology for graphics enhancements and agentic use cases in gaming. While specifics on future roadmap for large-scale AI compute remain undisclosed, ARM is committed to meeting the relentless demand for compute alongside GPUs. This includes developing specialized compute solutions that can complement existing GPU deployments, ensuring alignment with industry needs for both general and specific AI processing tasks.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Concepts
Small AI Development Cheat Sheet
Practical takeaways from this episode
Do This
Avoid This
Common Questions
Small AI refers to local, efficient, open-source models that run directly on user devices, offering advantages like lower cost, reduced latency, and increased privacy compared to large, centralized cloud models.
Topics
Mentioned in this video
A specific hardware device mentioned as an example of readily available hardware capable of running sophisticated AI stacks, including LLMs.
An AI model validated to run on the Raspberry Pi 5, showcasing the capability of smaller AI models on edge devices.
An NVIDIA GPU that the speaker plans to complement with ARM cores in their new DGX station for running nano models.
A line of processors developed by Apple, noted for integrating GPUs, which prompts a question about ARM's potential move towards similar integrated graphics for AI.
A partner of ARM that developed the Stable Audio model, used as an example of small AI running locally on consumer devices.
A partner of ARM that provides dev boards with ARM's U85 NPU, contributing to the wave of AI-ready hardware.
A high-performance computing station that the speaker acquired, planning to use its 20 ARM cores for running nano models alongside its GPU.
An AI framework that was released with ARM support on day one, enabling it to leverage ARM's NPU for audio and signal processing.
An AI model validated on Raspberry Pi 5, alongside Gemma 3, highlighting the feasibility of running LLMs on local hardware.
An AI model mentioned alongside Gemma 3 as an example of significant models accessible via ARM's app for Android and Chromebooks.
ARM's developer website, offering resources for those interested in building with small AI and connecting with the ARM ecosystem.
Platform where ARM released a blog and app allowing users to download and run open-source small AI models on Android phones or Chromebooks.
More from DeepLearningAI
View all 65 summaries
1 minThe #1 Skill Employers Want in 2026
1 minThe truth about tech layoffs and AI..
2 minBuild and Train an LLM with JAX
1 minWhat should you learn next? #AI #deeplearning
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free