Key Moments
The AI-First Graphics Editor - with Suhail Doshi of Playground AI
Key Moments
Playground AI founder Suhail Doshi discusses the evolution of AI graphics editors, training models from scratch, and the future of creative tools.
Key Insights
The evolution of AI from early ML attempts at Mixpanel to the current generative AI boom.
Mighty's ambition to create a new computer by streaming browsers, and the lessons learned about compute shifting.
Playground AI's focus on graphics editing as an underserved area in AI, inspired by the limitations of text-based playgrounds.
The development of Playground V2, a custom-trained model, to push the boundaries of image quality and utility beyond existing open-source models.
The importance of community, user feedback, and open-source contributions in advancing AI graphics.
Challenges and future directions in AI, including text synthesis, multimodal models, and the development of more intuitive user interfaces.
FROM ANALYTICS TO AI: THE MIXPANEL AND MIGHTY JOURNEY
Suhail Doshi shares his early experiences with machine learning at Mixpanel, where attempts to predict user behavior like churn and conversion using traditional ML yielded limited groundbreaking results. This led to a deeper dive into AI education. He then discusses Mighty, an ambitious project aiming to create a new type of computer by streaming browsers from data centers, highlighting the realization that the browser was becoming an operating system. While Mighty didn't succeed as a business, it reinforced his belief in shifting compute power for applications.
THE PIVOT TO PLAYGROUND: IDENTIFYING A GAP IN AI GRAPHICS
The inspiration for Playground AI emerged from Doshi's exploration of AI courses around the time Dall-E 2 and Stable Diffusion gained traction. He observed a significant gap in user interfaces for image generation compared to the more developed playgrounds for text models. While OpenAI had a functional GPT-3 playground, image generation tools were largely command-line or basic Hugging Face interfaces. This led to the idea of creating a powerful, visual editor for graphics, which felt like an underserved and ripe area for innovation.
TRAINING PLAYGROUND V2: PUSHING THE FRONTIERS OF IMAGE GENERATION
Playground V2 represents a significant step, with the model trained entirely from scratch, not just fine-tuned. This decision was driven by the perceived underinvestment in foundational models for pixels and images, which Doshi likens to the early GPT-2 era where utility was unclear. He contrasts this with the rapid iteration seen in the Stable Diffusion community. Playground AI aims to elevate image quality and utility beyond current open-source models, addressing limitations like inconsistent anatomy and offering greater creative control.
COMMUNITY, BENCHMARKS, AND THE QUEST FOR AESTHETICS
Doshi emphasizes the crucial role of the open-source community, likening it to the early days of personal computing. Playground AI aims to contribute back by releasing its models and weights. They developed the MJ30K benchmark, comparing against Midjourney, to drive progress in aesthetic evaluation. This focus on user-driven quality, rather than just academic metrics like FID, is central to their philosophy. User feedback and community engagement are key to identifying areas for improvement, such as better lighting, composition, and subject coherence.
THE EVOLVING UX OF AI GRAPHICS EDITORS
Playground AI is reimagining the graphics editor, moving beyond the simple text-prompt-to-image paradigm. Features like seed selection, multiple image outputs, canvas editing, and outpainting aim to provide a more intuitive and powerful creative workflow, drawing parallels to familiar software like Photoshop but adapted for AI. The goal is to make complex graphics accessible without requiring professional skills, addressing user desires like easily altering facial expressions in photos, a task still challenging for current models.
FUTURE DIRECTIONS AND MODALITIES IN AI
Looking ahead, Doshi foresees continued progress in areas like text synthesis, multimodal models, and the integration of new techniques like consistency models. He highlights the challenges in balancing broad utility with specialized features and the importance of tools that empower both hobbyists and professional users. Doshi also speculates on entirely new modalities beyond vision, language, and audio, suggesting that a true understanding of physics or complex world models could unlock novel AI applications and companies.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Organizations
●Studies Cited
●Concepts
●People Referenced
Common Questions
Mighty was Suhail Doshi's previous venture aiming to create a new type of computer by streaming browsers. When that concept proved unfeasible due to JavaScript's single-threaded nature, combined with the rise of AI models capable of massive parallel computation, Doshi pivoted towards image generation with Playground.
Topics
Mentioned in this video
A company involved in model training optimization, mentioned as an example of infrastructure in the AI space.
An open-source machine learning framework widely used for training AI models.
A statistical method used by Mixpanel's ML team that did not yield groundbreaking results.
A method for fine-tuning AI models on specific subjects or styles, often used with LoRAs.
A distilled, faster but lower-quality version of Stable Diffusion.
A popular repository for LoRAs (Low-Rank Adaptations) used in fine-tuning AI models.
A platform for exploring and generating AI art, mentioned for its image-heavy landing page.
An open-source system for automating deployment, scaling, and management of containerized applications.
An AI model that has shown recent progress in generating text within images.
A conversational AI model from OpenAI, which emerged after the initial GPT-3 playground interface.
A motion graphics and visual effects application that includes viewport rendering.
A 3D modeling and animation software that features viewport rendering, similar to Playground's preview rendering.
An organization offering AI courses that Suhail Doshi took multiple times, including in preparation for starting Playground AI.
A generative AI model that had a significant impact on Suhail Doshi, contrasting with his initial reaction to DALL-E 2.
An early language model from OpenAI, which had a playground interface and was considered for address bar prediction by Mighty.
A significant milestone in Stable Diffusion, released in July, mentioned as a predecessor to Playground V2.
An AI image generation tool that Playground AI benchmarks against and considers a leader in aesthetic quality.
A workload manager used for job scheduling in high-performance computing environments.
A generative AI model released in April by OpenAI, which caused a viral moment in generative AI, particularly with the 'avocado chair' image.
A recently released AI model by Google, mentioned in the context of benchmarks and performance approximations.
A generative AI model whose UI development Suhail Doshi inquired about with OpenAI.
A technique that enables precise control over image generation, widely used by the Stable Diffusion community and compatible with Playground V2.
A well-known graphics editor used as a reference point for reimagining the capabilities of AI-first graphical tools.
An artist whose name is often used in prompts to achieve a specific style, highlighting the need for more nuanced customization options.
Host of the Lex Fridman Podcast, mentioned as having Shere of Lexica on previously, with a similar landing page style.
Co-creator of DALL-E 2, with whom Suhail Doshi discussed the possibility of OpenAI building a UI for image generation.
Mentioned for his work using logo masks to create text-like effects in images.
Met by Suhail Doshi during a DALL-E 2 hackathon at OpenAI.
Mentioned for his analogies regarding AI models and compute, which Suhail Doshi resonates with.
Mentioned in the context of potential risks from deepfakes in future elections.
Mentioned as an example of someone who pursued curiosity and made groundbreaking discoveries, inspiring a startup approach.
Mentioned for its M1 chip in the context of CPU advancements.
The company where the Emu Edit research paper originated.
One of the major CPU manufacturers mentioned in the context of hardware limitations.
An analytics company co-founded by Suhail Doshi, which explored early machine learning for user churn and conversion prediction.
A previous venture by Suhail Doshi aimed at creating a new kind of computer by streaming browsers from data centers, which ultimately did not work.
A company in the AI/ML tooling space that Suhail Doshi considered competing with but decided against.
One of the major CPU manufacturers mentioned in the context of hardware limitations.
Suhail Doshi presented Mighty at a YC demo day.
The organization behind DALL-E 2 and GPT-3, which provided early access and had a playground interface for GPT-3.
A platform and community for machine learning models, where Stable Diffusion was initially distributed via code.
An analytics company mentioned in comparison to Mixpanel.
An infrastructure company providing API access to AI models.
A company where a friend of Suhail Doshi works, mentioned for building custom scheduling tools for AI infrastructure.
A genre and aesthetic style that can have various subgenres.
A subfield of machine learning that began gaining excitement around 2015-2016, which Mixpanel's team investigated.
The observation that the number of transistors in a dense integrated circuit doubles about every two years. Its continuation has been slowing down.
A theorem stating that any continuous function can be represented by a neural network with a single hidden layer. Applied here to the potential of parallel computation in AI models.
More from Latent Space
View all 185 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free