What inspired Suhail Doshi to focus on graphics AI?

The inspiration came from observing the rise of generative AI like DALL-E 2 and Stable Diffusion, noting a gap in sophisticated user interfaces for image generation compared to text-based models. Doshi also had a conversation with OpenAI about their interest in an image UI, which they declined to pursue, signaling an opportunity.

Why did Playground AI train its own model (Playground V2) instead of fine-tuning existing ones?

Playground AI felt that progress in image foundation models was underinvested compared to language models. They believed they could drive progress by training from scratch and also aimed to contribute back to the open-source community by releasing their model and weights.

What are the key areas for future research in image generation?

Desired research includes character consistency across frames for video, multitask editing, reasoning about image content, better text synthesis within images, and improvements in areas like face generation and subtle editing of features like smiles.

How does Playground AI approach benchmarking its models?

Playground AI uses a custom benchmark (MJQ 30K) comparing against leaders like Midjourney, focusing on aesthetic quality and challenging tasks where models typically struggle. They prioritize metrics that align with real-world user happiness over purely academic metrics like FID.

How are styles generated and managed in Playground AI?

Styles emerge organically from community use and experimentation. While well-known styles like photorealism are recognized, many new styles are named by users. Playground AI aims to support this emergent creativity and hopes for more intuitive ways to define styles beyond simple naming.

What is Playground AI's philosophy on user experience for graphics tools?

Playground aims to reimagine graphics editors beyond simple prompt boxes, drawing inspiration from tools like Photoshop. They focus on providing a visual and intuitive interface with deep customization options for power users, moving away from the 'terminal window' feel of early AI tools.

How does Playground AI handle safety and ethical considerations, especially regarding NSFW content?

Playground AI filters NSFW content from its training data to reduce the burden on safety filters. They acknowledge the complexity of defining tasteful art versus harmful content and emphasize the potential risks of powerful graphics models, especially in areas like generating realistic and consistent imagery.

What are the challenges of running AI infrastructure at Playground?

While inference is manageable, scaling training clusters is significantly harder due to the complexity of distributed systems and early-stage tooling. The cost of GPUs also necessitates running their own infrastructure to keep services affordable.

What is the most interesting unsolved question in AI, according to Suhail Doshi?

Doshi is curious about modalities beyond vision, language, and audio. He believes there might be many more modalities that are difficult for humans to perceive currently, and exploring these could lead to entirely new types of AI models and companies, potentially including physics-based foundation models.

Key Moments

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

Latent Space Podcast

Science & Technology3 min read70 min video

Jan 2, 2024|592 views|18

suhail doshi playground ai stablediffusion dall-e ai art

Save to Pod

Key Moments

TL;DR

Playground AI founder Suhail Doshi discusses the evolution of AI graphics editors, training models from scratch, and the future of creative tools.

Key Insights

The evolution of AI from early ML attempts at Mixpanel to the current generative AI boom.

Mighty's ambition to create a new computer by streaming browsers, and the lessons learned about compute shifting.

Playground AI's focus on graphics editing as an underserved area in AI, inspired by the limitations of text-based playgrounds.

The development of Playground V2, a custom-trained model, to push the boundaries of image quality and utility beyond existing open-source models.

The importance of community, user feedback, and open-source contributions in advancing AI graphics.

Challenges and future directions in AI, including text synthesis, multimodal models, and the development of more intuitive user interfaces.

FROM ANALYTICS TO AI: THE MIXPANEL AND MIGHTY JOURNEY

Suhail Doshi shares his early experiences with machine learning at Mixpanel, where attempts to predict user behavior like churn and conversion using traditional ML yielded limited groundbreaking results. This led to a deeper dive into AI education. He then discusses Mighty, an ambitious project aiming to create a new type of computer by streaming browsers from data centers, highlighting the realization that the browser was becoming an operating system. While Mighty didn't succeed as a business, it reinforced his belief in shifting compute power for applications.

THE PIVOT TO PLAYGROUND: IDENTIFYING A GAP IN AI GRAPHICS

The inspiration for Playground AI emerged from Doshi's exploration of AI courses around the time Dall-E 2 and Stable Diffusion gained traction. He observed a significant gap in user interfaces for image generation compared to the more developed playgrounds for text models. While OpenAI had a functional GPT-3 playground, image generation tools were largely command-line or basic Hugging Face interfaces. This led to the idea of creating a powerful, visual editor for graphics, which felt like an underserved and ripe area for innovation.

TRAINING PLAYGROUND V2: PUSHING THE FRONTIERS OF IMAGE GENERATION

Playground V2 represents a significant step, with the model trained entirely from scratch, not just fine-tuned. This decision was driven by the perceived underinvestment in foundational models for pixels and images, which Doshi likens to the early GPT-2 era where utility was unclear. He contrasts this with the rapid iteration seen in the Stable Diffusion community. Playground AI aims to elevate image quality and utility beyond current open-source models, addressing limitations like inconsistent anatomy and offering greater creative control.

COMMUNITY, BENCHMARKS, AND THE QUEST FOR AESTHETICS

Doshi emphasizes the crucial role of the open-source community, likening it to the early days of personal computing. Playground AI aims to contribute back by releasing its models and weights. They developed the MJ30K benchmark, comparing against Midjourney, to drive progress in aesthetic evaluation. This focus on user-driven quality, rather than just academic metrics like FID, is central to their philosophy. User feedback and community engagement are key to identifying areas for improvement, such as better lighting, composition, and subject coherence.

THE EVOLVING UX OF AI GRAPHICS EDITORS

Playground AI is reimagining the graphics editor, moving beyond the simple text-prompt-to-image paradigm. Features like seed selection, multiple image outputs, canvas editing, and outpainting aim to provide a more intuitive and powerful creative workflow, drawing parallels to familiar software like Photoshop but adapted for AI. The goal is to make complex graphics accessible without requiring professional skills, addressing user desires like easily altering facial expressions in photos, a task still challenging for current models.

FUTURE DIRECTIONS AND MODALITIES IN AI

Looking ahead, Doshi foresees continued progress in areas like text synthesis, multimodal models, and the integration of new techniques like consistency models. He highlights the challenges in balancing broad utility with specialized features and the importance of tools that empower both hobbyists and professional users. Doshi also speculates on entirely new modalities beyond vision, language, and audio, suggesting that a true understanding of physics or complex world models could unlock novel AI applications and companies.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Studies Cited

●Concepts

●People Referenced

Common Questions

Mighty was Suhail Doshi's previous venture aiming to create a new type of computer by streaming browsers. When that concept proved unfeasible due to JavaScript's single-threaded nature, combined with the rise of AI models capable of massive parallel computation, Doshi pivoted towards image generation with Playground.

Topics

Ai-Ethics AI & Machine Learning Technology & Innovation Generative AI Image Generation Computer Graphics AI Models User Interface Design

Mentioned in this video

Software & Apps

Mosaic

A company involved in model training optimization, mentioned as an example of infrastructure in the AI space.

PyTorch

An open-source machine learning framework widely used for training AI models.

Logistic Regression

A statistical method used by Mixpanel's ML team that did not yield groundbreaking results.

DreamBooth

A method for fine-tuning AI models on specific subjects or styles, often used with LoRAs.

Stable Diffusion Turbo

A distilled, faster but lower-quality version of Stable Diffusion.

Civitai

A popular repository for LoRAs (Low-Rank Adaptations) used in fine-tuning AI models.

Lexica

A platform for exploring and generating AI art, mentioned for its image-heavy landing page.

Kubernetes

An open-source system for automating deployment, scaling, and management of containerized applications.

Ideogram

An AI model that has shown recent progress in generating text within images.

ChatGPT

A conversational AI model from OpenAI, which emerged after the initial GPT-3 playground interface.

After Effects

A motion graphics and visual effects application that includes viewport rendering.

Cinema 4D

A 3D modeling and animation software that features viewport rendering, similar to Playground's preview rendering.

Fast.ai

An organization offering AI courses that Suhail Doshi took multiple times, including in preparation for starting Playground AI.

Stable Diffusion

A generative AI model that had a significant impact on Suhail Doshi, contrasting with his initial reaction to DALL-E 2.

GPT-3

An early language model from OpenAI, which had a playground interface and was considered for address bar prediction by Mighty.

Stable Diffusion XL

A significant milestone in Stable Diffusion, released in July, mentioned as a predecessor to Playground V2.

Midjourney

An AI image generation tool that Playground AI benchmarks against and considers a leader in aesthetic quality.

Slurm

A workload manager used for job scheduling in high-performance computing environments.

DALL-E 2

A generative AI model released in April by OpenAI, which caused a viral moment in generative AI, particularly with the 'avocado chair' image.

Gemini

A recently released AI model by Google, mentioned in the context of benchmarks and performance approximations.

DALL-E

A generative AI model whose UI development Suhail Doshi inquired about with OpenAI.

ControlNet

A technique that enables precise control over image generation, widely used by the Stable Diffusion community and compatible with Playground V2.

Photoshop

A well-known graphics editor used as a reference point for reimagining the capabilities of AI-first graphical tools.

People

Greg Rutkowski

An artist whose name is often used in prompts to achieve a specific style, highlighting the need for more nuanced customization options.

Lex Fridman

Host of the Lex Fridman Podcast, mentioned as having Shere of Lexica on previously, with a similar landing page style.

Emad Mostaque

Co-creator of DALL-E 2, with whom Suhail Doshi discussed the possibility of OpenAI building a UI for image generation.

Max Wolf T M

Mentioned for his work using logo masks to create text-like effects in images.

Sam Altman

Met by Suhail Doshi during a DALL-E 2 hackathon at OpenAI.

Andrej Karpathy

Mentioned for his analogies regarding AI models and compute, which Suhail Doshi resonates with.

Joe Biden

Mentioned in the context of potential risks from deepfakes in future elections.

Albert Einstein

Mentioned as an example of someone who pursued curiosity and made groundbreaking discoveries, inspiring a startup approach.

Studies & Research

Andrej Karpathy's Classes

Highly recommended learning resource for AI, preferred over books by Suhail Doshi.

Emu Edit

A research paper from Meta on multitask image editing, mentioned as a relevant but not yet productized concept.

Companies

Apple

Mentioned for its M1 chip in the context of CPU advancements.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free