What are the main drawbacks of using large AI models in the cloud?

Large models face limitations such as high token costs, latency issues requiring constant connectivity, closed ecosystems leading to vendor lock-in, limited transparency, and potential data privacy concerns, making them less accessible for startups and specific tasks.

How does Small AI improve efficiency and reduce costs for AI projects?

Small AI dramatically reduces the cost of reasoning, potentially bringing it down to near zero compared to cloud models' high costs. It also offers significant latency improvements, allowing for faster response times and more economically viable agentic workflows.

What new developer skills are emerging with the rise of Small AI?

Beyond prompt engineering, skills like specialized fine-tuning are becoming crucial. This allows experts in specific domains, even if not traditional developers, to customize models for niche use cases and deploy them on edge devices.

Can current hardware run sophisticated Small AI models effectively?

Yes, ARM-based laptops and edge devices can already run powerful models (e.g., under 20 billion parameters) locally. This provides benefits like data privacy, offline operation, and near-zero inference time, making cloud dependency less mandatory.

What future hardware developments will further enhance Small AI?

Upcoming hardware includes ARM's scalable matrix extensions for faster agentic AI and AI-native hardware designs with integrated NPUs. These advancements will make Small AI more practical, performant, and widely available across various devices.

What is the minimum hardware requirement for running a meaningful AI application?

Even microcontrollers can run 'tiny models' for basic storytelling, while devices like Raspberry Pi can handle sophisticated agentic processes. More advanced applications like facial recognition and object detection have also been demonstrated on microcontrollers.

Key Moments

AI Dev 25 x NYC | Eric Sondhi: Small AI: The Next Big Thing

DeepLearning.AI

Education5 min read31 min video

Dec 4, 2025|687 views|16

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

Small AI, running locally on devices, offers an accessible, cost-effective alternative to large, centralized AI models.

Key Insights

A fork in the road for AI development: 'Big AI' (large, centralized, costly models) versus 'Small AI' (local, efficient, open-source models).

Small AI addresses limitations of Big AI, including high costs, latency, vendor lock-in, and lack of transparency.

Open-source innovation and frameworks like PyTorch have fueled the rise of accessible and increasingly powerful small AI models.

Agentic AI, which reasons and acts autonomously, can be more scalable and economically viable with small AI due to reduced reasoning costs and latency.

Developers can leverage domain expertise to fine-tune small AI models for specialized use cases, creating a competitive advantage.

ARM's hardware, including laptops and mobile devices, is already capable of running significant small AI models locally, with future hardware optimizations to further accelerate performance.

The future of AI is 'everywhere,' with small AI becoming a default part of application stacks, enabling ubiquitous AI experiences across devices.

THE DUAL PATHWAYS OF AI DEVELOPMENT

The AI landscape is bifurcating into two distinct paths: 'Big AI' and 'Small AI'. Big AI represents large, centralized, proprietary models developed by a few tech giants, often accessed via cloud services. In contrast, Small AI encompasses open-source, efficient models designed to run directly on user devices, such as laptops and smartphones. This distinction has critical implications for accessibility, cost, and innovation in AI development and deployment.

DRAWBACKS OF CENTRALIZED 'BIG AI'

Large, hosted AI models present several challenges. The most apparent is the prohibitive cost, especially at enterprise scale, hindering startups and rapid development. Beyond cost, latency issues arise, requiring constant connectivity. Furthermore, closed ecosystems create vendor lock-in, limited transparency regarding data usage, and potential quality compromises when a specialized model is needed for a specific task rather than a general-purpose one.

THE EMERGENCE AND ADVANTAGES OF 'SMALL AI'

The growth of open-source frameworks like PyTorch and TensorFlow has democratized AI, enabling the creation of increasingly capable yet smaller models. Small AI offers significant advantages, including local execution without internet dependency, leading to faster response times and democratized experimentation. Its community-driven evolution allows for rapid iteration cycles powered by global collaboration, making it highly adaptable.

ENABLING SCALABLE AGENTIC WORKFLOWS

Agentic AI systems, capable of autonomous reasoning and task execution, can benefit immensely from Small AI. Large models struggle with deep reasoning pipelines due to high computational, environmental, and financial costs. Small AI provides an economically viable alternative, drastically reducing the cost of each reasoning step and improving latency by orders of magnitude. This makes complex, multi-step reasoning more scalable and accessible.

REDUCING IMPLEMENTATION COSTS AND FAILURE RATES

The high implementation costs associated with Big AI contribute to a significant failure rate in AI projects, with studies indicating that up to 90% fail to produce meaningful revenue or business impact. Small AI enables a 'fail fast' philosophy, allowing developers to prototype, iterate, and adapt more quickly and cost-effectively. This approach helps identify what isn't working early on, preventing the sinking of excessive resources and leading to solutions that deliver practical, real-world value.

DEVELOPER SKILLS AND VERTICAL SPECIALIZATION

The shift towards Small AI fosters new developer skills, moving beyond prompt engineering to specialized fine-tuning. Professionals can leverage their domain expertise, even without deep development backgrounds, to customize models for specific use cases. This vertical specialization creates a competitive advantage, allowing for the deployment of AI solutions directly where users are—on edge devices, in IoT, or on mobile phones – making AI solutions highly differentiating.

HARDWARE READINESS AND FUTURE ACCELERATION

Current ARM-based hardware, including laptops and edge devices, already possesses the capability to run sophisticated small AI models locally. Future ARM architectures, featuring scalable matrix extensions and NPUs, will further accelerate AI performance, making local execution even faster and more efficient. This ongoing hardware evolution promises to make Small AI adoption more practical, performant, and widely available across a diverse range of devices.

THE 'AI EVERYWHERE' PARADIGM

The convergence of capable local hardware and accessible Small AI models is paving the way for an 'AI Everywhere' paradigm. This envisions applications with cascading AI models running at various levels of proximity—from large cloud-based models to those on laptops, mobile devices, and even tiny IoT devices. Continuity of experience will be maintained as these models and developer workflows evolve, powered by increasingly performant hardware.

CALL TO ACTION: EMBRACE SMALL AI AND SPECIALIZATION

Developers are encouraged to start small, iterate quickly, and explore the capabilities of Small AI by integrating and fine-tuning models for specific value. Building with Small AI, rather than solely relying on giant cloud providers, offers a path to create unique competitive edges through specialization. The future of AI is not just about size, but about being smarter, faster, and closer to the user—in their pocket and on their devices. ARM offers resources and an ecosystem to support this journey.

MINIMUM HARDWARE REQUIREMENTS FOR USEFUL AI

Even microcontrollers are now capable of running pared-down Tiny ML models for tasks like storytelling or basic facial recognition. These small, dedicated models can add interactivity to toys or board games. Further up, embedded systems can perform object detection and user authentication. This demonstrates a broad spectrum of applicability for AI, from the smallest devices to more complex edge computing scenarios, with continuous improvements in model efficiency and hardware capabilities.

THE ROLE OF CPU AND ACCELERATED COMPUTE

While CPUs have limitations for massive foundation models, their capabilities are often underestimated for running a vast range of smaller AI models. Awareness of these models and their efficient execution on CPUs, especially when complemented by accelerated compute platforms like GPUs or NPUs, is key. Optimizing complementary inference parts like RAG (Retrieval-Augmented Generation) on the CPU can significantly enhance performance, making hybrid approaches increasingly viable.

ARM'S EVOLVING HARDWARE AND FUTURE PROSPECTS

ARM designs GPUs for power efficiency, particularly in mobile, incorporating neural technology for graphics enhancements and agentic use cases in gaming. While specifics on future roadmap for large-scale AI compute remain undisclosed, ARM is committed to meeting the relentless demand for compute alongside GPUs. This includes developing specialized compute solutions that can complement existing GPU deployments, ensuring alignment with industry needs for both general and specific AI processing tasks.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Concepts

Small AI Development Cheat Sheet

Practical takeaways from this episode

Do This

Start small and iterate fast with local models.

Leverage open-source frameworks like PyTorch and Exeggutor.

Fine-tune models for domain-specific value and competitive edge.

Explore using nano models alongside larger models.

Consider optimizing RAG and other inference components on the CPU.

Embrace evolving developer skills like specialized fine-tuning.

Avoid This

Don't rely solely on giant cloud providers for AI development.

Avoid over-engineering without clear ROI.

Don't get locked into expensive, proprietary systems.

Avoid premature scaling that doesn't equal smart scaling.

Common Questions

Small AI refers to local, efficient, open-source models that run directly on user devices, offering advantages like lower cost, reduced latency, and increased privacy compared to large, centralized cloud models.

Topics

Small AI Edge AI Fine-tuning AI AI Costs

Mentioned in this video

Products

Raspberry Pi 5

A specific hardware device mentioned as an example of readily available hardware capable of running sophisticated AI stacks, including LLMs.

Blackwell GPU

An NVIDIA GPU that the speaker plans to complement with ARM cores in their new DGX station for running nano models.

Apple Silicon

A line of processors developed by Apple, noted for integrating GPUs, which prompts a question about ARM's potential move towards similar integrated graphics for AI.

NVIDIA DGX stations

A high-performance computing station that the speaker acquired, planning to use its 20 ARM cores for running nano models alongside its GPU.

Software & Apps

Google DeepMind Gemma 3

An AI model validated to run on the Raspberry Pi 5, showcasing the capability of smaller AI models on edge devices.

Exeggutor

An AI framework that was released with ARM support on day one, enabling it to leverage ARM's NPU for audio and signal processing.

Quen 3

An AI model validated on Raspberry Pi 5, alongside Gemma 3, highlighting the feasibility of running LLMs on local hardware.

54 mini

An AI model mentioned alongside Gemma 3 as an example of significant models accessible via ARM's app for Android and Chromebooks.

developer.arm.com

ARM's developer website, offering resources for those interested in building with small AI and connecting with the ARM ecosystem.

Google Play Store

Platform where ARM released a blog and app allowing users to download and run open-source small AI models on Android phones or Chromebooks.

Companies

Stability AI

A partner of ARM that developed the Stable Audio model, used as an example of small AI running locally on consumer devices.

Alice Semiconductor

A partner of ARM that provides dev boards with ARM's U85 NPU, contributing to the wave of AI-ready hardware.

People

Grace Blackwell

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free