Key Moments

⚡️Accelerators @ 3x NVIDIA H200 perf, Made in the USA - Thomas Sohmers + Mitesh Agrawal, Positron AI

Latent Space PodcastLatent Space Podcast
Science & Technology3 min read48 min video
Aug 18, 2025|3,654 views|93|7
Save to Pod
TL;DR

Positron AI builds efficient AI inference accelerators focusing on memory bandwidth, outperforming NVIDIA on key metrics.

Key Insights

1

AI inference bottleneck is primarily memory bandwidth, not compute, particularly for transformer models.

2

Positron AI's architecture achieves significantly higher memory bandwidth utilization (93% vs. ~29% on NVIDIA H100).

3

The company's current FPGA-based accelerators offer 70% higher performance than NVIDIA's H100 at a lower power and price point.

4

Positron AI prioritizes seamless integration into existing NVIDIA CUDA ecosystems, accepting raw binary weights without recompilation.

5

Future ASIC generation promises even greater memory capacity and performance improvements over current FPGA solutions.

6

The company emphasizes capital efficiency, aiming to sell systems and demonstrate strong Return on Invested Capital (ROIC) rather than operate cloud services.

FOUNDING STORY AND EXPERTISE

Thomas Sohmers and Mitesh Agrawal, co-founders of Positron AI, bring extensive semiconductor and AI infrastructure experience. Sohmers, a hardware veteran from Rex Computing and Lambda Labs, identifies memory bandwidth as the critical bottleneck in AI inference. Agrawal, an early employee and former CEO of Lambda Labs, leveraged his operational and growth expertise to build Positron AI. Their shared vision stems from recognizing the limitations of existing hardware for the emerging demands of transformer models.

THE MEMORY BANDWIDTH BOTTLENECK

The core thesis driving Positron AI is that modern AI workloads, especially transformer inference, are overwhelmingly memory-bound rather than compute-bound. Unlike older CNN models that were compute-intensive, transformers performing matrix-vector multiplications require constant data movement. This contrasts with traditional hardware architectures optimized for FLOPS, leading to inefficiencies where memory bandwidth and capacity are the primary constraints hindering performance.

POSITRON AI'S ARCHITECTURAL ADVANTAGE

Positron AI's novel architecture is meticulously designed to maximize memory bandwidth utilization, achieving up to 93% of theoretical bandwidth compared to NVIDIA's H100 which utilizes only about 29%. This is enabled by specialized compute elements optimized for a 1:1 ratio of FLOPS to memory operations. By focusing on this critical aspect, their current FPGA-based accelerators deliver superior performance per watt and per dollar, even when compared against high-end NVIDIA GPUs.

SEAMLESS ECOSYSTEM INTEGRATION

A key differentiator for Positron AI is its commitment to simplifying adoption. Instead of requiring users to recompile models or change their existing workflows, their hardware directly ingests raw binary weights produced by NVIDIA's CUDA ecosystem. This 'zero-step' integration allows models trained on NVIDIA GPUs to run on Positron AI's hardware with minimal effort, enabling customers to easily benefit from performance and efficiency gains without disrupting their development pipelines.

ROADMAP TO DEDICATED SILICON

While currently shipping FPGA-based solutions for rapid market entry, Positron AI is developing next-generation custom silicon (ASIC). This dedicated hardware will further enhance performance and efficiency by addressing the inherent limitations of FPGAs, offering significantly more memory capacity and surpassing current industry benchmarks. The company aims for these ASICs to be tape-out ready by late 2026, promising a substantial leap in inference capabilities.

BUSINESS STRATEGY AND CAPITAL EFFICIENCY

Positron AI's business model is centered on selling hardware systems and demonstrating strong Return on Invested Capital (ROIC) for its customers, eschewing the cloud-service model adopted by some competitors. They emphasize capital efficiency in their funding, having raised $75 million across seed and Series A rounds to focus on product development and customer acquisition. This approach aims to deliver tangible value and economic viability, positioning Positron AI as a strong contender in the AI hardware market.

PERFORMANCE METRICS AND FUTURE APPLICATIONS

The company highlights its advantage in token generation (inference decode) over prefill. Their solutions are not only effective for text generation but also crucial for emerging modalities like video and complex reasoning, where generation stages often dominate. By enabling massive memory capacity and bandwidth, Positron AI aims to support advancements in multimodal AI and allow for more efficient, cost-effective inference at scale, ultimately expanding the possibilities for AI deployment.

Common Questions

Positron AI is developing specialized hardware accelerators focused on improving performance per dollar and performance per watt for AI inference workloads, particularly transformers. Their mission is to address the memory bandwidth bottleneck that limits current AI systems.

Topics

Mentioned in this video

More from Latent Space

View all 201 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free