Key Moments

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

Latent Space PodcastLatent Space Podcast
Science & Technology3 min read70 min video
Jan 2, 2024|592 views|18
Save to Pod
TL;DR

Playground AI founder Suhail Doshi discusses the evolution of AI graphics editors, training models from scratch, and the future of creative tools.

Key Insights

1

The evolution of AI from early ML attempts at Mixpanel to the current generative AI boom.

2

Mighty's ambition to create a new computer by streaming browsers, and the lessons learned about compute shifting.

3

Playground AI's focus on graphics editing as an underserved area in AI, inspired by the limitations of text-based playgrounds.

4

The development of Playground V2, a custom-trained model, to push the boundaries of image quality and utility beyond existing open-source models.

5

The importance of community, user feedback, and open-source contributions in advancing AI graphics.

6

Challenges and future directions in AI, including text synthesis, multimodal models, and the development of more intuitive user interfaces.

FROM ANALYTICS TO AI: THE MIXPANEL AND MIGHTY JOURNEY

Suhail Doshi shares his early experiences with machine learning at Mixpanel, where attempts to predict user behavior like churn and conversion using traditional ML yielded limited groundbreaking results. This led to a deeper dive into AI education. He then discusses Mighty, an ambitious project aiming to create a new type of computer by streaming browsers from data centers, highlighting the realization that the browser was becoming an operating system. While Mighty didn't succeed as a business, it reinforced his belief in shifting compute power for applications.

THE PIVOT TO PLAYGROUND: IDENTIFYING A GAP IN AI GRAPHICS

The inspiration for Playground AI emerged from Doshi's exploration of AI courses around the time Dall-E 2 and Stable Diffusion gained traction. He observed a significant gap in user interfaces for image generation compared to the more developed playgrounds for text models. While OpenAI had a functional GPT-3 playground, image generation tools were largely command-line or basic Hugging Face interfaces. This led to the idea of creating a powerful, visual editor for graphics, which felt like an underserved and ripe area for innovation.

TRAINING PLAYGROUND V2: PUSHING THE FRONTIERS OF IMAGE GENERATION

Playground V2 represents a significant step, with the model trained entirely from scratch, not just fine-tuned. This decision was driven by the perceived underinvestment in foundational models for pixels and images, which Doshi likens to the early GPT-2 era where utility was unclear. He contrasts this with the rapid iteration seen in the Stable Diffusion community. Playground AI aims to elevate image quality and utility beyond current open-source models, addressing limitations like inconsistent anatomy and offering greater creative control.

COMMUNITY, BENCHMARKS, AND THE QUEST FOR AESTHETICS

Doshi emphasizes the crucial role of the open-source community, likening it to the early days of personal computing. Playground AI aims to contribute back by releasing its models and weights. They developed the MJ30K benchmark, comparing against Midjourney, to drive progress in aesthetic evaluation. This focus on user-driven quality, rather than just academic metrics like FID, is central to their philosophy. User feedback and community engagement are key to identifying areas for improvement, such as better lighting, composition, and subject coherence.

THE EVOLVING UX OF AI GRAPHICS EDITORS

Playground AI is reimagining the graphics editor, moving beyond the simple text-prompt-to-image paradigm. Features like seed selection, multiple image outputs, canvas editing, and outpainting aim to provide a more intuitive and powerful creative workflow, drawing parallels to familiar software like Photoshop but adapted for AI. The goal is to make complex graphics accessible without requiring professional skills, addressing user desires like easily altering facial expressions in photos, a task still challenging for current models.

FUTURE DIRECTIONS AND MODALITIES IN AI

Looking ahead, Doshi foresees continued progress in areas like text synthesis, multimodal models, and the integration of new techniques like consistency models. He highlights the challenges in balancing broad utility with specialized features and the importance of tools that empower both hobbyists and professional users. Doshi also speculates on entirely new modalities beyond vision, language, and audio, suggesting that a true understanding of physics or complex world models could unlock novel AI applications and companies.

Common Questions

Mighty was Suhail Doshi's previous venture aiming to create a new type of computer by streaming browsers. When that concept proved unfeasible due to JavaScript's single-threaded nature, combined with the rise of AI models capable of massive parallel computation, Doshi pivoted towards image generation with Playground.

Topics

Mentioned in this video

Software & Apps
Mosaic

A company involved in model training optimization, mentioned as an example of infrastructure in the AI space.

PyTorch

An open-source machine learning framework widely used for training AI models.

Logistic Regression

A statistical method used by Mixpanel's ML team that did not yield groundbreaking results.

DreamBooth

A method for fine-tuning AI models on specific subjects or styles, often used with LoRAs.

Stable Diffusion Turbo

A distilled, faster but lower-quality version of Stable Diffusion.

Civitai

A popular repository for LoRAs (Low-Rank Adaptations) used in fine-tuning AI models.

Lexica

A platform for exploring and generating AI art, mentioned for its image-heavy landing page.

Kubernetes

An open-source system for automating deployment, scaling, and management of containerized applications.

Ideogram

An AI model that has shown recent progress in generating text within images.

ChatGPT

A conversational AI model from OpenAI, which emerged after the initial GPT-3 playground interface.

After Effects

A motion graphics and visual effects application that includes viewport rendering.

Cinema 4D

A 3D modeling and animation software that features viewport rendering, similar to Playground's preview rendering.

Fast.ai

An organization offering AI courses that Suhail Doshi took multiple times, including in preparation for starting Playground AI.

Stable Diffusion

A generative AI model that had a significant impact on Suhail Doshi, contrasting with his initial reaction to DALL-E 2.

GPT-3

An early language model from OpenAI, which had a playground interface and was considered for address bar prediction by Mighty.

Stable Diffusion XL

A significant milestone in Stable Diffusion, released in July, mentioned as a predecessor to Playground V2.

Midjourney

An AI image generation tool that Playground AI benchmarks against and considers a leader in aesthetic quality.

Slurm

A workload manager used for job scheduling in high-performance computing environments.

DALL-E 2

A generative AI model released in April by OpenAI, which caused a viral moment in generative AI, particularly with the 'avocado chair' image.

Gemini

A recently released AI model by Google, mentioned in the context of benchmarks and performance approximations.

DALL-E

A generative AI model whose UI development Suhail Doshi inquired about with OpenAI.

ControlNet

A technique that enables precise control over image generation, widely used by the Stable Diffusion community and compatible with Playground V2.

Photoshop

A well-known graphics editor used as a reference point for reimagining the capabilities of AI-first graphical tools.

More from Latent Space

View all 185 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free