How does Positron AI differ from traditional GPU manufacturers like NVIDIA?

Unlike NVIDIA, which optimizes for general-purpose computation and training, Positron AI focuses on the specific memory-bound nature of transformer inference. They achieve significantly higher memory bandwidth utilization (93% vs. NVIDIA's ~29%) and consequently offer higher performance per watt and dollar.

What is Positron AI's go-to-market strategy regarding software integration?

Positron AI integrates seamlessly with the existing NVIDIA ecosystem by ingesting raw model weights trained on NVIDIA GPUs. This allows users to deploy models on Positron hardware with zero modifications or recompilation, simplifying adoption.

Why did Positron AI choose FPGAs for their initial product?

The founders leveraged FPGAs to achieve shipping in just 15 months from company founding. While FPGAs can be less power-efficient for general tasks, Positron's unique architecture optimized for memory-bound workloads allowed them to achieve significant efficiency gains.

What is Positron AI's approach to future hardware generations?

Positron AI plans to transition to dedicated ASICs for their next generation of silicon, aiming to further improve performance, memory capacity (4-5x current), and bandwidth. They project their ASIC will have the highest memory capacity among available silicon in late 2026/2027.

What is Positron AI's philosophy on designing hardware for AI models?

Positron AI focuses on building a strong linear algebra accelerator optimized for matrix-vector math, rather than hardening hardware for specific, rapidly evolving model architectures like transformers. They believe fundamental math capabilities are more enduring.

How does Positron AI measure success and differentiate from other AI chip startups?

Positron AI emphasizes revenue generation and delivering real value to customers, rather than just raising capital. They focus on achieving a strong Return on Invested Capital (ROIC) for their customers and aim to sell systems, not own the infrastructure like some competitors.

Where does Positron AI see its biggest performance advantage over NVIDIA?

Positron AI's primary advantage lies in the generation or decode phase of LLMs, where the memory-bound nature of producing tokens one at a time is most pronounced. They also highlight their large memory capacity for caching KV caches and system prompts, reducing compute.

Key Moments

⚡️Accelerators @ 3x NVIDIA H200 perf, Made in the USA - Thomas Sohmers + Mitesh Agrawal, Positron AI

Latent Space Podcast

Science & Technology3 min read48 min video

Aug 18, 2025|3,654 views|93|7

Save to Pod

Key Moments

TL;DR

Positron AI builds efficient AI inference accelerators focusing on memory bandwidth, outperforming NVIDIA on key metrics.

Key Insights

AI inference bottleneck is primarily memory bandwidth, not compute, particularly for transformer models.

Positron AI's architecture achieves significantly higher memory bandwidth utilization (93% vs. ~29% on NVIDIA H100).

The company's current FPGA-based accelerators offer 70% higher performance than NVIDIA's H100 at a lower power and price point.

Positron AI prioritizes seamless integration into existing NVIDIA CUDA ecosystems, accepting raw binary weights without recompilation.

Future ASIC generation promises even greater memory capacity and performance improvements over current FPGA solutions.

The company emphasizes capital efficiency, aiming to sell systems and demonstrate strong Return on Invested Capital (ROIC) rather than operate cloud services.

FOUNDING STORY AND EXPERTISE

Thomas Sohmers and Mitesh Agrawal, co-founders of Positron AI, bring extensive semiconductor and AI infrastructure experience. Sohmers, a hardware veteran from Rex Computing and Lambda Labs, identifies memory bandwidth as the critical bottleneck in AI inference. Agrawal, an early employee and former CEO of Lambda Labs, leveraged his operational and growth expertise to build Positron AI. Their shared vision stems from recognizing the limitations of existing hardware for the emerging demands of transformer models.

THE MEMORY BANDWIDTH BOTTLENECK

The core thesis driving Positron AI is that modern AI workloads, especially transformer inference, are overwhelmingly memory-bound rather than compute-bound. Unlike older CNN models that were compute-intensive, transformers performing matrix-vector multiplications require constant data movement. This contrasts with traditional hardware architectures optimized for FLOPS, leading to inefficiencies where memory bandwidth and capacity are the primary constraints hindering performance.

POSITRON AI'S ARCHITECTURAL ADVANTAGE

Positron AI's novel architecture is meticulously designed to maximize memory bandwidth utilization, achieving up to 93% of theoretical bandwidth compared to NVIDIA's H100 which utilizes only about 29%. This is enabled by specialized compute elements optimized for a 1:1 ratio of FLOPS to memory operations. By focusing on this critical aspect, their current FPGA-based accelerators deliver superior performance per watt and per dollar, even when compared against high-end NVIDIA GPUs.

SEAMLESS ECOSYSTEM INTEGRATION

A key differentiator for Positron AI is its commitment to simplifying adoption. Instead of requiring users to recompile models or change their existing workflows, their hardware directly ingests raw binary weights produced by NVIDIA's CUDA ecosystem. This 'zero-step' integration allows models trained on NVIDIA GPUs to run on Positron AI's hardware with minimal effort, enabling customers to easily benefit from performance and efficiency gains without disrupting their development pipelines.

ROADMAP TO DEDICATED SILICON

While currently shipping FPGA-based solutions for rapid market entry, Positron AI is developing next-generation custom silicon (ASIC). This dedicated hardware will further enhance performance and efficiency by addressing the inherent limitations of FPGAs, offering significantly more memory capacity and surpassing current industry benchmarks. The company aims for these ASICs to be tape-out ready by late 2026, promising a substantial leap in inference capabilities.

BUSINESS STRATEGY AND CAPITAL EFFICIENCY

Positron AI's business model is centered on selling hardware systems and demonstrating strong Return on Invested Capital (ROIC) for its customers, eschewing the cloud-service model adopted by some competitors. They emphasize capital efficiency in their funding, having raised $75 million across seed and Series A rounds to focus on product development and customer acquisition. This approach aims to deliver tangible value and economic viability, positioning Positron AI as a strong contender in the AI hardware market.

PERFORMANCE METRICS AND FUTURE APPLICATIONS

The company highlights its advantage in token generation (inference decode) over prefill. Their solutions are not only effective for text generation but also crucial for emerging modalities like video and complex reasoning, where generation stages often dominate. By enabling massive memory capacity and bandwidth, Positron AI aims to support advancements in multimodal AI and allow for more efficient, cost-effective inference at scale, ultimately expanding the possibilities for AI deployment.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

Positron AI is developing specialized hardware accelerators focused on improving performance per dollar and performance per watt for AI inference workloads, particularly transformers. Their mission is to address the memory bandwidth bottleneck that limits current AI systems.

Topics

Transformer Inference Memory Bandwidth Performance Per Watt Performance Per Dollar FPGA ASIC LLM Acceleration Positron AI

Mentioned in this video

People

George Hot

Mentioned for his public criticism of Etched regarding their approach to AI chip design.

Products

ASIC

Application-Specific Integrated Circuits are the target for Positron's next generation of silicon, promising greater efficiency and capacity than FPGAs.

FPGA

Field-Programmable Gate Arrays are discussed as a prototyping and simulation tool. Positron leverages them for their current product but plans to move to dedicated silicon for the next generation.

cryptocurrency ASICs

Specialized chips for cryptocurrency mining that Thomas Sohmers worked on before Lambda Labs.

Concepts

TF32

A 19-bit number format used by NVIDIA. Positron AI uses a similar format in their FPGA, offering better precision than BF16.

BF16

A 16-bit floating-point format mentioned in comparison to NVIDIA's TF32.

A process node for semiconductor manufacturing relevant to Positron AI's hiring needs for silicon engineers.

Software & Apps

ChatGPT image gen

Mentioned as an example of an image generation system using a pure autoregressive transformer.

Companies

Positron AI

A company focused on developing specialized hardware accelerators for AI inference, aiming for higher performance per dollar and watt.

Etched

Another AI chip startup that has faced public criticism from George Hot; Positron acknowledges the difficulty of new silicon but disagrees with Etched's hardening approach.

Parasel

A customer utilizing Positron AI's hardware for raw performance in their inference-as-a-service offerings.

Rex Computing

Thomas Sohmers' first semiconductor startup focused on DSP for mobile base station workloads.

Lambda Labs

Organizations

Valor Equity Partners

Investor in Positron AI's Series A funding round.

DFJ Growth

A venture capital firm that participated in Positron AI's Series A funding round.

Media

Image 3

Another Google image generation model that has transitioned to an autoregressive transformer architecture.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free