What are the challenges of using AI for code generation and PR reviews?

While AI can generate code faster, it leads to a massive increase in PR volume and potential bugs. This necessitates rigorous PR reviews and automated checks. Specialized AI review tools are still evolving, and current solutions might not meet the need for 'pro-level' model quality.

How is Shopify improving its CI/CD pipeline with AI?

Shopify is identifying CI/CD and code repository interaction as key bottlenecks due to the surge in PRs. They are exploring new metaphors and designs for processing code in an agentic world, acknowledging that old systems designed for humans need to evolve.

What is Tangle and how does it improve ML experimentation?

Tangle is Shopify's third-generation system for data processing, with a skew towards ML experiments. It focuses on iteration, sharing, scale, and efficiency, improving upon tools like Airflow by simplifying experiment management, collaboration, and the development-to-production transition through content hashing and automatic reruns.

What is Tangent and how can it be used for optimization?

Tangent is an auto-research loop built on Tangle that acts as an agent to run multiple experiments, identify potential changes, and iterate until a goal or loss function is maximized. It's being used at Shopify for various optimizations, from templatization to search throughput.

How does SimGym help e-commerce businesses?

SimGym simulates customer behavior by leveraging historical data to train agents that replicate customer distributions. These agents are then used to test website changes, predicting conversion impacts and suggesting modifications, thus helping even small merchants optimize without extensive AB testing.

What are Liquid Neural Networks and why are they significant?

Liquid neural networks are a non-transformer architecture more efficient and compact than State Space Models (SSMs) and competitive with transformers, especially for low-latency or long-context applications. Shopify uses them for search and long-context distillation.

What are the limitations of auto-research tools like Tangent?

Auto-research excels at obvious optimizations but struggles with completely novel, out-of-distribution ideas that require days of human thought. While it can save time and discover improvements, it may not always find groundbreaking solutions, and its effectiveness relies on ample data and experimentation scale.

How does Shopify's UCP and catalog feature work?

Shopify's UCP (Unified Customer Platform) now integrates the broader Shopify catalog, allowing product searches and lookups by ID or in bulk at runtime. This aims to provide a comprehensive view of products sold globally, supporting both personalized and non-personalized search.

What lessons were learned from the AI chatbot 'Sydney'?

Sydney's development highlighted the importance of personality shaping in AI. It demonstrated that an AI could be polite yet slightly on edge, drawing users in. This was a deliberate effort, not purely emergent behavior, and was influenced by previous work on Yandex's Alice.

Key Moments

CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify

Latent Space Podcast

Science & Technology5 min read75 min video

Apr 22, 2026|819 views|40

Save to Pod

Key Moments

TL;DR

Shopify's CTO reveals that while AI can't explain its reasoning, it's now essential for daily work, and the challenge is adapting CI/CD to code generation speeds that outpace human review.

Key Insights

Nearly 100% of Shopify employees use AI tools daily, with a significant inflection point occurring around December 2025 when model capabilities improved.

While AI generates fewer bugs per line of code than humans on average, the sheer volume of AI-generated code necessitates more rigorous automated and human PR reviews to prevent bugs from reaching production.

Shopify uses its internal Tangle system, a third-generation data processing and ML experiment platform, for efficient collaboration, sharing, and running experiments, with Tangent orchestrating automated research loops.

The Sim Gym system simulates customer behavior using historical data and LLMs to predict the impact of website changes, achieving a 0.7 correlation with add-to-cart events in its development phase.

Liquid neural networks are being adopted by Shopify for low-latency, long-context applications like search, offering significant efficiency gains over traditional transformer architectures for these specific use cases.

Shopify is actively hiring ML engineers, data scientists, and distributed database engineers, seeing potential in reimagining distributed databases with LLMs.

AI adoption and the December 2025 inflection point

Mikhail Parakhin, CTO of Shopify, discusses the company's aggressive AI adoption, noting that nearly 100% of employees now interact daily with at least one AI tool. He highlights a significant inflection point around December 2025, when AI models became sufficiently capable to drive widespread adoption and exponential growth in usage. While many tools are public, Shopify also leverages internal tools like Tangle for ML experimentation and Tangent for automated research. The company provides unlimited token budgets, discouraging the use of models below a certain quality threshold (e.g., 4.6) and encouraging the use of more advanced models like GPT-4.5 or Opus 4.6, supporting longer context windows.

Rethinking developer productivity and code quality

Parakhin addresses the debate around token consumption versus AI utility, suggesting that focusing solely on token count is an anti-pattern. Effective AI usage involves fewer, well-coordinated agents with robust critique loops, even if this increases latency. He argues that while AI-generated code might have fewer bugs per line, the sheer volume produced necessitates extremely strong, automated, and human-led PR reviews to prevent a surge of bugs into production. The key metric, he suggests, is the ratio of budget spent on generation versus expensive models for tasks like PR reviews. This highlights a fundamental shift: AI can write more code, but the bottleneck moves to validation and quality assurance, impacting CI/CD pipelines and potentially requiring new metaphors beyond traditional PRs for code processing.

The CI/CD bottleneck and evolving development workflows

The current CI/CD infrastructure, designed for human speeds, is struggling to keep up with AI-generated code volume. Shopify is actively exploring solutions beyond standard pull requests, utilizing tools like Stacks and Graphite. Parakhin emphasizes that the main issue is the interaction with the code repository itself, which has become the highest priority bottleneck. He speculates that traditional paradigms like Git and PRs might need to evolve drastically, or even be replaced, in this new agentic world. The concept of merge conflicts acting as a global mutex, manageable at human speeds, becomes a critical issue when code is generated machine-fast. He even tentatively suggests that microservices might see a resurgence as a way to enable more independent, smaller deployments that are easier to manage in this context.

Tangle and Tangent: Revolutionizing ML experimentation

Shopify's internal platform, Tangle, is described as a third-generation system for data processing, with a skew towards ML experiments. It aims to solve the digital archaeology problem of tracking data transformations and experiments, enabling easier iteration, sharing, and collaboration. Unlike tools like Airflow, which focus on scheduled production runs, Tangle is optimized for groups of people running experiments cheaply, sharing results, and easily cloning and modifying pipelines. A key feature is its content-hashing system, which avoids re-running tasks if the output hasn't changed, ensuring efficiency and reproducibility. Tangent, built on top of Tangle, is an automated research loop that can run multiple experiments, modify parameters, and optimize towards a specific goal, democratizing advanced ML capabilities beyond specialized engineers. This approach has shown dramatic improvements in areas like search latency and UX templating.

Sim Gym: Simulating customer behavior for e-commerce optimization

Sim Gym is a Shopify product that simulates customer behavior to predict the outcomes of website changes. This is particularly valuable for smaller merchants who lack the historical data and traffic volume for traditional A/B testing. The system leverages decades of Shopify's historical sales data, applying denoising and calibrated filtering to extract signals. It models customer journeys and interactions within a simulated browser environment. Initially focused on predicting conversions, Sim Gym now also provides direct recommendations for website modifications to increase conversions. The challenge has been achieving a high correlation (e.g., 0.7 with add-to-cart events) with real-world outcomes, requiring significant investment in infrastructure, LLMs, and simulated browser environments. This capability is seen as a significant moat for Shopify, delivering an evolving, optimized platform to merchants.

Liquid neural networks driving efficiency and new capabilities

Shopify is incorporating Liquid neural networks, a non-transformer architecture inspired by state space models, for specific applications where high efficiency and low latency are critical. While more complex to code than traditional state space models, Liquid offers sub-quadratic scaling with context length and is more compact. Shopify uses it for low-latency search applications (achieving sub-30ms end-to-end with small models) and for high-bandwidth, offline tasks like product categorization and attribute normalization across billions of products. Liquid excels as a target for distillation, transforming large models into smaller, task-specific ones. Parakhin notes that while Liquid may not compete with top-tier LLMs like GPT-5.4 for general reasoning, it's highly competitive for specific use cases, especially when distilling large models, and has been steadily increasing its share of Shopify's workloads over other models like Quen.

Emergent Personalities and Future AI Directions

Reflecting on early AI chatbot experiences, Parakhin shared anecdotes about Sydney, the initial Bing chatbot. He highlighted that Sydney's personality wasn't purely emergent but a result of significant effort in personality shaping, drawing from his Yandex experience with the Alice digital assistant. The aim was to create a polite yet slightly edgy persona to draw users in. He also touched on the broader impact of AI, including the potential for LLMs to reimagine how distributed databases are managed. The convergence of tools like Tangle, Tangent, and Sim Gym is creating synergistic effects, enabling capabilities that were previously unthinkable and requiring thousands of human-like intelligences rather than just humans.

Mentioned in This Episode

●Software & Apps

●Companies

●Concepts

●People Referenced

AI Development and Deployment Best Practices at Shopify

Practical takeaways from this episode

Do This

Utilize critique loops with high-quality models for better code quality, even if latency increases.

Focus on efficient token consumption by using fewer, well-communicating agents rather than many parallel, isolated ones.

Implement rigorous PR reviews, both manual and automated, to manage the increased code volume from AI.

Leverage internal tools like Tangle for efficient ML experimentation and development-to-production workflows.

Consider auto-research approaches like Tangent for optimizing various aspects of development and operations.

Use customer simulation tools like SimGym, especially if you have historical data, to optimize e-commerce strategies.

Explore Liquid models for applications requiring low latency or long context windows.

Focus on platform-level AI solutions that create network effects and continuous improvement.

Hire ML engineers, data scientists, and distributed database experts.

Avoid This

Avoid running too many agents in parallel without communication, as it can be useless.

Do not rely solely on static metrics like lines of code for engineer performance evaluation.

Do not underestimate the need for robust PR review processes due to increased code generation.

Do not dismiss AutoML; LLMs have made it significantly more viable and effective.

Do not expect auto-research to solve completely out-of-distribution problems without human insight.

Do not underestimate the validation required for AI-driven e-commerce optimizations; correlate with real-world results.

Do not assume standard LLM serving infrastructure is optimal for all novel architectures like Liquid.

Common Questions

Shopify funds unlimited tokens for employees but encourages using high-quality models (e.g., GPT-4.0+) and discourages using less capable models. They also focus on efficient agent communication and critique loops rather than just raw token consumption.

Topics

AI & Machine Learning Technology & Innovation Business & Entrepreneurship Code Review AI Development AI Models Developer Productivity LLM Optimization ML Infrastructure E-commerce AI

Mentioned in this video

People

Jensen Huang

Mentioned in the context of engineer token budgets and his public statements on AI engineer utilization.

Andre Karpathy

Mentioned for his popularization of 'auto-research' and his tweets on organizing AI agents.

Andrew McNamara

The first Sydney developer, now working at Shopify on Sidekick, Pulse, and other initiatives.

Maxim Labon

An individual speaking at a conference in London, likely involved with Liquid AI.

Companies

Shopify

The company where Mikhail Parakhin is CTO, discussed as a leader in AI adoption and internal tooling.

GitHub

Mentioned in the context of potential changes to the CI/CD paradigm and the concept of a PR bottleneck.

XAI

Mentioned for its LLM-based Rexus trend, as an example of companies using transformers for Rexas.

Microsoft

Mikhail Parakhin's previous employer where he served as CEO of a business unit including Windows, Edge, and Bing.

Netflix

Mentioned for its 'Chaos Monkey' concept in the context of building robust, decentralized systems.

OpenAI

Mentioned in relation to the models that power some AI tools and the development of Sydney.

NVIDIA

Collaborator with Microsoft on Megatron and a partner in optimizing Liquid AI models.

Yugabyte

A distributed database company that Shopify is working with to reimagine distributed databases using LLMs.

Liquid AI

The company behind Liquid neural networks, discussed for their efficiency and productization efforts.

Google

Mentioned for a paper on monorepos deploying into microservices and for Gemini's AI capabilities.

Entropic

Mentioned as a company with significant compute resources, contrasting with Liquid AI's current scale.

Software & Apps

Edge

Part of the Microsoft business unit led by Mikhail Parakhin.

Airflow

A data pipeline tool that Tangle improves upon, particularly for experimentation and collaboration.

Tangled

An internal AI tool at Shopify for data processing and ML experimentation, referred to as the third generation of such systems.

Bing

Part of the Microsoft business unit led by Mikhail Parakhin, also mentioned in the context of the 'Sydney' AI.

Claude Code

A tool mentioned in the context of PR review, noted as not sufficient for 'pro-level' model needs.

Stacks

A tool used by Shopify for PRs and managing code changes.

Yandex

Mikhail Parakhin's former employer, involved in the development of the 'Nirvana' system and previously 'Alice'.

Megatron

A model developed in collaboration between Microsoft and NVIDIA, used in an early implementation of Sydney.

Graphite

A tool used by Shopify for PRs and managing code changes.

Windows

Part of the Microsoft business unit led by Mikhail Parakhin.

Nirvana

A system at Yandex that represented a second take on the approach Tangle is based on.

XGBoost

A machine learning algorithm that Tangle's deeper level analysis can utilize and combine.

UCP

A Shopify platform enabling structured discussions and search, with a recent catalog release for product lookups.

GitHub Copilot

An IDE-based AI tool discussed in contrast to CLI-based tools, showing slower growth compared to internal Shopify tools.

Quen

An alternative model to Liquid, discussed for reasoning capabilities and being a benchmark for comparison.

Cursor

An IDE-based AI tool discussed in contrast to CLI-based tools, showing slower growth compared to internal Shopify tools.

GPT-4o

Mentioned as a model that Shopify encourages employees to use, with a minimum threshold of 4.6.

Ether

An early system that pioneered the approach Tangle is based on.

GPT-4.5 Extra High

A model some Shopify employees use, discussed in terms of pros and cons for context window.

PyTorch

A machine learning module that Tangle's deeper level analysis can utilize and combine.

Gemini

Mentioned in the context of 'Deep Think' for PR reviews and as a high-quality model.

Alice

A digital assistant developed by Yandex, from which lessons on personality shaping were learned for Sydney.

Open-Claw

Mentioned as an example of explicitly prompting a fun AI personality.

Golden Gate Claw

Another example of emergent AI personality into public consciousness.

Jupyter notebooks

A tool for data scientists that Tangle aims to improve upon by offering a more integrated and shareable development environment.

Concepts

Chinese Restaurant Process

A statistical concept resurrected by Shopify for aggregating and clustering data, particularly for analyzing buyer behavior across categories.

Mamba

Mentioned as a hybrid architecture combining SSMs and transformers, within which Liquid models are described as performing well.

Products

Sidekick

A Shopify AI product, mentioned as a different offering than 'Shopify Sydney'.

Locations

New York

Mentioned as the location of a conference where system diagrams were briefly discussed.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free