Key Moments

CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify

Latent Space PodcastLatent Space Podcast
Science & Technology5 min read75 min video
Apr 22, 2026|819 views|40
Save to Pod
TL;DR

Shopify's CTO reveals that while AI can't explain its reasoning, it's now essential for daily work, and the challenge is adapting CI/CD to code generation speeds that outpace human review.

Key Insights

1

Nearly 100% of Shopify employees use AI tools daily, with a significant inflection point occurring around December 2025 when model capabilities improved.

2

While AI generates fewer bugs per line of code than humans on average, the sheer volume of AI-generated code necessitates more rigorous automated and human PR reviews to prevent bugs from reaching production.

3

Shopify uses its internal Tangle system, a third-generation data processing and ML experiment platform, for efficient collaboration, sharing, and running experiments, with Tangent orchestrating automated research loops.

4

The Sim Gym system simulates customer behavior using historical data and LLMs to predict the impact of website changes, achieving a 0.7 correlation with add-to-cart events in its development phase.

5

Liquid neural networks are being adopted by Shopify for low-latency, long-context applications like search, offering significant efficiency gains over traditional transformer architectures for these specific use cases.

6

Shopify is actively hiring ML engineers, data scientists, and distributed database engineers, seeing potential in reimagining distributed databases with LLMs.

AI adoption and the December 2025 inflection point

Mikhail Parakhin, CTO of Shopify, discusses the company's aggressive AI adoption, noting that nearly 100% of employees now interact daily with at least one AI tool. He highlights a significant inflection point around December 2025, when AI models became sufficiently capable to drive widespread adoption and exponential growth in usage. While many tools are public, Shopify also leverages internal tools like Tangle for ML experimentation and Tangent for automated research. The company provides unlimited token budgets, discouraging the use of models below a certain quality threshold (e.g., 4.6) and encouraging the use of more advanced models like GPT-4.5 or Opus 4.6, supporting longer context windows.

Rethinking developer productivity and code quality

Parakhin addresses the debate around token consumption versus AI utility, suggesting that focusing solely on token count is an anti-pattern. Effective AI usage involves fewer, well-coordinated agents with robust critique loops, even if this increases latency. He argues that while AI-generated code might have fewer bugs per line, the sheer volume produced necessitates extremely strong, automated, and human-led PR reviews to prevent a surge of bugs into production. The key metric, he suggests, is the ratio of budget spent on generation versus expensive models for tasks like PR reviews. This highlights a fundamental shift: AI can write more code, but the bottleneck moves to validation and quality assurance, impacting CI/CD pipelines and potentially requiring new metaphors beyond traditional PRs for code processing.

The CI/CD bottleneck and evolving development workflows

The current CI/CD infrastructure, designed for human speeds, is struggling to keep up with AI-generated code volume. Shopify is actively exploring solutions beyond standard pull requests, utilizing tools like Stacks and Graphite. Parakhin emphasizes that the main issue is the interaction with the code repository itself, which has become the highest priority bottleneck. He speculates that traditional paradigms like Git and PRs might need to evolve drastically, or even be replaced, in this new agentic world. The concept of merge conflicts acting as a global mutex, manageable at human speeds, becomes a critical issue when code is generated machine-fast. He even tentatively suggests that microservices might see a resurgence as a way to enable more independent, smaller deployments that are easier to manage in this context.

Tangle and Tangent: Revolutionizing ML experimentation

Shopify's internal platform, Tangle, is described as a third-generation system for data processing, with a skew towards ML experiments. It aims to solve the digital archaeology problem of tracking data transformations and experiments, enabling easier iteration, sharing, and collaboration. Unlike tools like Airflow, which focus on scheduled production runs, Tangle is optimized for groups of people running experiments cheaply, sharing results, and easily cloning and modifying pipelines. A key feature is its content-hashing system, which avoids re-running tasks if the output hasn't changed, ensuring efficiency and reproducibility. Tangent, built on top of Tangle, is an automated research loop that can run multiple experiments, modify parameters, and optimize towards a specific goal, democratizing advanced ML capabilities beyond specialized engineers. This approach has shown dramatic improvements in areas like search latency and UX templating.

Sim Gym: Simulating customer behavior for e-commerce optimization

Sim Gym is a Shopify product that simulates customer behavior to predict the outcomes of website changes. This is particularly valuable for smaller merchants who lack the historical data and traffic volume for traditional A/B testing. The system leverages decades of Shopify's historical sales data, applying denoising and calibrated filtering to extract signals. It models customer journeys and interactions within a simulated browser environment. Initially focused on predicting conversions, Sim Gym now also provides direct recommendations for website modifications to increase conversions. The challenge has been achieving a high correlation (e.g., 0.7 with add-to-cart events) with real-world outcomes, requiring significant investment in infrastructure, LLMs, and simulated browser environments. This capability is seen as a significant moat for Shopify, delivering an evolving, optimized platform to merchants.

Liquid neural networks driving efficiency and new capabilities

Shopify is incorporating Liquid neural networks, a non-transformer architecture inspired by state space models, for specific applications where high efficiency and low latency are critical. While more complex to code than traditional state space models, Liquid offers sub-quadratic scaling with context length and is more compact. Shopify uses it for low-latency search applications (achieving sub-30ms end-to-end with small models) and for high-bandwidth, offline tasks like product categorization and attribute normalization across billions of products. Liquid excels as a target for distillation, transforming large models into smaller, task-specific ones. Parakhin notes that while Liquid may not compete with top-tier LLMs like GPT-5.4 for general reasoning, it's highly competitive for specific use cases, especially when distilling large models, and has been steadily increasing its share of Shopify's workloads over other models like Quen.

Emergent Personalities and Future AI Directions

Reflecting on early AI chatbot experiences, Parakhin shared anecdotes about Sydney, the initial Bing chatbot. He highlighted that Sydney's personality wasn't purely emergent but a result of significant effort in personality shaping, drawing from his Yandex experience with the Alice digital assistant. The aim was to create a polite yet slightly edgy persona to draw users in. He also touched on the broader impact of AI, including the potential for LLMs to reimagine how distributed databases are managed. The convergence of tools like Tangle, Tangent, and Sim Gym is creating synergistic effects, enabling capabilities that were previously unthinkable and requiring thousands of human-like intelligences rather than just humans.

AI Development and Deployment Best Practices at Shopify

Practical takeaways from this episode

Do This

Utilize critique loops with high-quality models for better code quality, even if latency increases.
Focus on efficient token consumption by using fewer, well-communicating agents rather than many parallel, isolated ones.
Implement rigorous PR reviews, both manual and automated, to manage the increased code volume from AI.
Leverage internal tools like Tangle for efficient ML experimentation and development-to-production workflows.
Consider auto-research approaches like Tangent for optimizing various aspects of development and operations.
Use customer simulation tools like SimGym, especially if you have historical data, to optimize e-commerce strategies.
Explore Liquid models for applications requiring low latency or long context windows.
Focus on platform-level AI solutions that create network effects and continuous improvement.
Hire ML engineers, data scientists, and distributed database experts.

Avoid This

Avoid running too many agents in parallel without communication, as it can be useless.
Do not rely solely on static metrics like lines of code for engineer performance evaluation.
Do not underestimate the need for robust PR review processes due to increased code generation.
Do not dismiss AutoML; LLMs have made it significantly more viable and effective.
Do not expect auto-research to solve completely out-of-distribution problems without human insight.
Do not underestimate the validation required for AI-driven e-commerce optimizations; correlate with real-world results.
Do not assume standard LLM serving infrastructure is optimal for all novel architectures like Liquid.

Common Questions

Shopify funds unlimited tokens for employees but encourages using high-quality models (e.g., GPT-4.0+) and discourages using less capable models. They also focus on efficient agent communication and critique loops rather than just raw token consumption.

Topics

Mentioned in this video

Software & Apps
Edge

Part of the Microsoft business unit led by Mikhail Parakhin.

Airflow

A data pipeline tool that Tangle improves upon, particularly for experimentation and collaboration.

Tangled

An internal AI tool at Shopify for data processing and ML experimentation, referred to as the third generation of such systems.

Bing

Part of the Microsoft business unit led by Mikhail Parakhin, also mentioned in the context of the 'Sydney' AI.

Claude Code

A tool mentioned in the context of PR review, noted as not sufficient for 'pro-level' model needs.

Stacks

A tool used by Shopify for PRs and managing code changes.

Yandex

Mikhail Parakhin's former employer, involved in the development of the 'Nirvana' system and previously 'Alice'.

Megatron

A model developed in collaboration between Microsoft and NVIDIA, used in an early implementation of Sydney.

Graphite

A tool used by Shopify for PRs and managing code changes.

Windows

Part of the Microsoft business unit led by Mikhail Parakhin.

Nirvana

A system at Yandex that represented a second take on the approach Tangle is based on.

XGBoost

A machine learning algorithm that Tangle's deeper level analysis can utilize and combine.

UCP

A Shopify platform enabling structured discussions and search, with a recent catalog release for product lookups.

GitHub Copilot

An IDE-based AI tool discussed in contrast to CLI-based tools, showing slower growth compared to internal Shopify tools.

Quen

An alternative model to Liquid, discussed for reasoning capabilities and being a benchmark for comparison.

Cursor

An IDE-based AI tool discussed in contrast to CLI-based tools, showing slower growth compared to internal Shopify tools.

GPT-4o

Mentioned as a model that Shopify encourages employees to use, with a minimum threshold of 4.6.

Ether

An early system that pioneered the approach Tangle is based on.

GPT-4.5 Extra High

A model some Shopify employees use, discussed in terms of pros and cons for context window.

PyTorch

A machine learning module that Tangle's deeper level analysis can utilize and combine.

Gemini

Mentioned in the context of 'Deep Think' for PR reviews and as a high-quality model.

Alice

A digital assistant developed by Yandex, from which lessons on personality shaping were learned for Sydney.

Open-Claw

Mentioned as an example of explicitly prompting a fun AI personality.

Golden Gate Claw

Another example of emergent AI personality into public consciousness.

Jupyter notebooks

A tool for data scientists that Tangle aims to improve upon by offering a more integrated and shareable development environment.

More from Latent Space

View all 209 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free