Key Moments
CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify
Key Moments
Shopify's CTO reveals that while AI can't explain its reasoning, it's now essential for daily work, and the challenge is adapting CI/CD to code generation speeds that outpace human review.
Key Insights
Nearly 100% of Shopify employees use AI tools daily, with a significant inflection point occurring around December 2025 when model capabilities improved.
While AI generates fewer bugs per line of code than humans on average, the sheer volume of AI-generated code necessitates more rigorous automated and human PR reviews to prevent bugs from reaching production.
Shopify uses its internal Tangle system, a third-generation data processing and ML experiment platform, for efficient collaboration, sharing, and running experiments, with Tangent orchestrating automated research loops.
The Sim Gym system simulates customer behavior using historical data and LLMs to predict the impact of website changes, achieving a 0.7 correlation with add-to-cart events in its development phase.
Liquid neural networks are being adopted by Shopify for low-latency, long-context applications like search, offering significant efficiency gains over traditional transformer architectures for these specific use cases.
Shopify is actively hiring ML engineers, data scientists, and distributed database engineers, seeing potential in reimagining distributed databases with LLMs.
AI adoption and the December 2025 inflection point
Mikhail Parakhin, CTO of Shopify, discusses the company's aggressive AI adoption, noting that nearly 100% of employees now interact daily with at least one AI tool. He highlights a significant inflection point around December 2025, when AI models became sufficiently capable to drive widespread adoption and exponential growth in usage. While many tools are public, Shopify also leverages internal tools like Tangle for ML experimentation and Tangent for automated research. The company provides unlimited token budgets, discouraging the use of models below a certain quality threshold (e.g., 4.6) and encouraging the use of more advanced models like GPT-4.5 or Opus 4.6, supporting longer context windows.
Rethinking developer productivity and code quality
Parakhin addresses the debate around token consumption versus AI utility, suggesting that focusing solely on token count is an anti-pattern. Effective AI usage involves fewer, well-coordinated agents with robust critique loops, even if this increases latency. He argues that while AI-generated code might have fewer bugs per line, the sheer volume produced necessitates extremely strong, automated, and human-led PR reviews to prevent a surge of bugs into production. The key metric, he suggests, is the ratio of budget spent on generation versus expensive models for tasks like PR reviews. This highlights a fundamental shift: AI can write more code, but the bottleneck moves to validation and quality assurance, impacting CI/CD pipelines and potentially requiring new metaphors beyond traditional PRs for code processing.
The CI/CD bottleneck and evolving development workflows
The current CI/CD infrastructure, designed for human speeds, is struggling to keep up with AI-generated code volume. Shopify is actively exploring solutions beyond standard pull requests, utilizing tools like Stacks and Graphite. Parakhin emphasizes that the main issue is the interaction with the code repository itself, which has become the highest priority bottleneck. He speculates that traditional paradigms like Git and PRs might need to evolve drastically, or even be replaced, in this new agentic world. The concept of merge conflicts acting as a global mutex, manageable at human speeds, becomes a critical issue when code is generated machine-fast. He even tentatively suggests that microservices might see a resurgence as a way to enable more independent, smaller deployments that are easier to manage in this context.
Tangle and Tangent: Revolutionizing ML experimentation
Shopify's internal platform, Tangle, is described as a third-generation system for data processing, with a skew towards ML experiments. It aims to solve the digital archaeology problem of tracking data transformations and experiments, enabling easier iteration, sharing, and collaboration. Unlike tools like Airflow, which focus on scheduled production runs, Tangle is optimized for groups of people running experiments cheaply, sharing results, and easily cloning and modifying pipelines. A key feature is its content-hashing system, which avoids re-running tasks if the output hasn't changed, ensuring efficiency and reproducibility. Tangent, built on top of Tangle, is an automated research loop that can run multiple experiments, modify parameters, and optimize towards a specific goal, democratizing advanced ML capabilities beyond specialized engineers. This approach has shown dramatic improvements in areas like search latency and UX templating.
Sim Gym: Simulating customer behavior for e-commerce optimization
Sim Gym is a Shopify product that simulates customer behavior to predict the outcomes of website changes. This is particularly valuable for smaller merchants who lack the historical data and traffic volume for traditional A/B testing. The system leverages decades of Shopify's historical sales data, applying denoising and calibrated filtering to extract signals. It models customer journeys and interactions within a simulated browser environment. Initially focused on predicting conversions, Sim Gym now also provides direct recommendations for website modifications to increase conversions. The challenge has been achieving a high correlation (e.g., 0.7 with add-to-cart events) with real-world outcomes, requiring significant investment in infrastructure, LLMs, and simulated browser environments. This capability is seen as a significant moat for Shopify, delivering an evolving, optimized platform to merchants.
Liquid neural networks driving efficiency and new capabilities
Shopify is incorporating Liquid neural networks, a non-transformer architecture inspired by state space models, for specific applications where high efficiency and low latency are critical. While more complex to code than traditional state space models, Liquid offers sub-quadratic scaling with context length and is more compact. Shopify uses it for low-latency search applications (achieving sub-30ms end-to-end with small models) and for high-bandwidth, offline tasks like product categorization and attribute normalization across billions of products. Liquid excels as a target for distillation, transforming large models into smaller, task-specific ones. Parakhin notes that while Liquid may not compete with top-tier LLMs like GPT-5.4 for general reasoning, it's highly competitive for specific use cases, especially when distilling large models, and has been steadily increasing its share of Shopify's workloads over other models like Quen.
Emergent Personalities and Future AI Directions
Reflecting on early AI chatbot experiences, Parakhin shared anecdotes about Sydney, the initial Bing chatbot. He highlighted that Sydney's personality wasn't purely emergent but a result of significant effort in personality shaping, drawing from his Yandex experience with the Alice digital assistant. The aim was to create a polite yet slightly edgy persona to draw users in. He also touched on the broader impact of AI, including the potential for LLMs to reimagine how distributed databases are managed. The convergence of tools like Tangle, Tangent, and Sim Gym is creating synergistic effects, enabling capabilities that were previously unthinkable and requiring thousands of human-like intelligences rather than just humans.
Mentioned in This Episode
●Software & Apps
●Companies
●Concepts
●People Referenced
AI Development and Deployment Best Practices at Shopify
Practical takeaways from this episode
Do This
Avoid This
Common Questions
Shopify funds unlimited tokens for employees but encourages using high-quality models (e.g., GPT-4.0+) and discourages using less capable models. They also focus on efficient agent communication and critique loops rather than just raw token consumption.
Topics
Mentioned in this video
Mentioned in the context of engineer token budgets and his public statements on AI engineer utilization.
Mentioned for his popularization of 'auto-research' and his tweets on organizing AI agents.
The first Sydney developer, now working at Shopify on Sidekick, Pulse, and other initiatives.
An individual speaking at a conference in London, likely involved with Liquid AI.
The company where Mikhail Parakhin is CTO, discussed as a leader in AI adoption and internal tooling.
Mentioned in the context of potential changes to the CI/CD paradigm and the concept of a PR bottleneck.
Mentioned for its LLM-based Rexus trend, as an example of companies using transformers for Rexas.
Mikhail Parakhin's previous employer where he served as CEO of a business unit including Windows, Edge, and Bing.
Mentioned for its 'Chaos Monkey' concept in the context of building robust, decentralized systems.
Mentioned in relation to the models that power some AI tools and the development of Sydney.
Collaborator with Microsoft on Megatron and a partner in optimizing Liquid AI models.
A distributed database company that Shopify is working with to reimagine distributed databases using LLMs.
The company behind Liquid neural networks, discussed for their efficiency and productization efforts.
Mentioned for a paper on monorepos deploying into microservices and for Gemini's AI capabilities.
Mentioned as a company with significant compute resources, contrasting with Liquid AI's current scale.
Part of the Microsoft business unit led by Mikhail Parakhin.
A data pipeline tool that Tangle improves upon, particularly for experimentation and collaboration.
An internal AI tool at Shopify for data processing and ML experimentation, referred to as the third generation of such systems.
Part of the Microsoft business unit led by Mikhail Parakhin, also mentioned in the context of the 'Sydney' AI.
A tool mentioned in the context of PR review, noted as not sufficient for 'pro-level' model needs.
A tool used by Shopify for PRs and managing code changes.
Mikhail Parakhin's former employer, involved in the development of the 'Nirvana' system and previously 'Alice'.
A model developed in collaboration between Microsoft and NVIDIA, used in an early implementation of Sydney.
A tool used by Shopify for PRs and managing code changes.
Part of the Microsoft business unit led by Mikhail Parakhin.
A system at Yandex that represented a second take on the approach Tangle is based on.
A machine learning algorithm that Tangle's deeper level analysis can utilize and combine.
A Shopify platform enabling structured discussions and search, with a recent catalog release for product lookups.
An IDE-based AI tool discussed in contrast to CLI-based tools, showing slower growth compared to internal Shopify tools.
An alternative model to Liquid, discussed for reasoning capabilities and being a benchmark for comparison.
An IDE-based AI tool discussed in contrast to CLI-based tools, showing slower growth compared to internal Shopify tools.
Mentioned as a model that Shopify encourages employees to use, with a minimum threshold of 4.6.
An early system that pioneered the approach Tangle is based on.
A model some Shopify employees use, discussed in terms of pros and cons for context window.
A machine learning module that Tangle's deeper level analysis can utilize and combine.
Mentioned in the context of 'Deep Think' for PR reviews and as a high-quality model.
A digital assistant developed by Yandex, from which lessons on personality shaping were learned for Sydney.
Mentioned as an example of explicitly prompting a fun AI personality.
Another example of emergent AI personality into public consciousness.
A tool for data scientists that Tangle aims to improve upon by offering a more integrated and shareable development environment.
A statistical concept resurrected by Shopify for aggregating and clustering data, particularly for analyzing buyer behavior across categories.
Mentioned as a hybrid architecture combining SSMs and transformers, within which Liquid models are described as performing well.
More from Latent Space
View all 209 summaries
86 min🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik
49 min⚡️ How to turn Documents into Knowledge: Graphs in Modern AI — Emil Eifrem, CEO Neo4J
86 minNotion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work
58 min⚡️ The best engineers don't write the most code. They delete the most code. — Stay Sassy
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Get Started Free