Key Moments

Building an open AI company - with Ce and Vipul of Together AI

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read76 min video
Feb 8, 2024|1,640 views|22
Save to Pod
TL;DR

Together AI builds open, decentralized AI systems, focusing on efficient training, inference, and novel architectures like State Space Models.

Key Insights

1

Together AI's core mission is to foster open and independent AI systems, providing a platform for open-source and user-owned models.

2

The company is building a decentralized cloud infrastructure, combining data centers globally rather than relying on hyperscalers.

3

Together AI prioritizes data quality, exemplified by their RedPajama datasets, offering flexible filtering and quality signals for users.

4

State Space Models (SSMs) are a significant research focus, promising more efficient training and inference, not just for long contexts but also for general improvements.

5

Optimizing inference is a multi-dimensional challenge, requiring a joint approach combining algorithms, model architecture, systems, and hardware.

6

The GPU market remains tight, emphasizing the value of efficient compute utilization and Together AI's decentralized infrastructure approach.

THE ORIGINS AND MISSION OF TOGETHER AI

Together AI was founded in June 2022 with a fundamental mission to build open and independent AI systems. The company views AI as one of the most consequential technologies of our time and aims to create a platform for open-source and user-owned models, differentiating itself from the proprietary platforms of large frontier model labs. This ethos of openness and decentralization is deeply embedded in their platform strategy and business model. They are actively building their own AI supercomputers through a disaggregated and decentralized network of data centers, choosing not to rely on traditional hyperscalers.

FROM APPLE'S ECOSYSTEM TO OPEN SOURCE ETHOS

Drawing from experience at Apple, where technology is often hidden behind polished products to create seamless user experiences, Together AI aims to bring a similar focus on developer experience to their open platform. The co-founders emphasize applying complex technology to simplify everyday tasks for developers. Past work with deep learning systems, including an open-domain Q&A system, highlighted the power and potential of these technologies, especially as models scale. The advent of large-scale models, driven by algorithms that improve with scale, marked a new era of computing, reinforcing the company's direction.

THE REDPAJAMA DATASET INITIATIVE

Recognizing the critical role of data in AI development, Together AI launched the RedPajama dataset. This initiative builds upon previous community efforts like C4 and LLaMA's data recipes, aiming to provide a high-quality, reproducible dataset for open model pre-training. RedPajama V1 was a best-effort reproduction of the LLaMA dataset, and V2 expanded significantly to 30 trillion tokens. V2 incorporated lessons learned from V1, focusing on modularity and data quality signals, offering 40 pre-computed quality signals to allow users to tailor datasets to specific applications, moving beyond a one-size-fits-all approach.

ADVANCING MODEL ARCHITECTURES: STATE SPACE MODELS

Together AI is heavily investing in research, with a significant portion of its team dedicated to exploring novel model architectures, particularly State Space Models (SSMs). This research is driven by the limitations of current Transformer architectures in terms of inference speed and cost, especially for long sequences. SSMs offer a path towards more efficient training and inference, reducing computational complexity and enabling longer context windows. The Hybrid architecture, like in Strap-Hyena, combines SSMs with Transformers, exploring how different components can optimize information processing across a context, promising significant advancements beyond just long-context capabilities.

OPTIMIZING AI INFERENCE AND INFRASTRUCTURE

Optimizing AI inference is a key focus, viewed as a multi-dimensional problem requiring advancements in algorithms, model architectures, systems, and hardware. Together AI employs a combination of approximately 50 'tricks and techniques' to enhance inference performance. Their decentralized infrastructure, utilizing thousands of GPUs (primarily H100s and A100s), aims for optimal compute utilization. The company is building a serverless platform to make AI development more accessible, allowing users to train, fine-tune, and run models without substantial upfront commitments, significantly lowering the barrier to entry for AI-driven applications.

THE CHALLENGES OF BENCHMARKING AND DATA ACCESSIBILITY

The discussion highlighted the complexities and potential pitfalls of AI benchmarking, emphasizing the need for standardized, independent, and transparent methodologies to avoid over-optimization and misaligned incentives. Together AI actively provides feedback to ensure benchmarks truly reflect technical progress. Furthermore, data accessibility remains a crucial issue, as the demand for diverse, high-quality data beyond publicly available internet data grows. The company believes a marketplace for data utilization and a balanced approach to data openness is necessary for the overall advancement of open AI.

THE FUTURE OF EMBEDDINGS AND TRAINING

Embeddings are considered a fundamental building block for many AI applications, including retrieval-augmented generation (RAG) and foundational model training. Together AI sees significant room for improvement in embedding quality and speed, enabling more accurate semantic understanding and closing the loop in iterative model development. On the training front, the company observes a continuous cycle of model refinement rather than one-off training runs, indicating sustained demand for robust training infrastructure and expertise, which Together AI aims to provide through its platform.

UNSOLVED QUESTIONS AND TOGETHER AI'S VISION

Looking ahead, Together AI is focused on developing a comprehensive framework for understanding the impact of advanced AI systems and is continuously hiring across all areas, from research to systems engineering. They believe in the power of compounding innovation across the entire AI stack. The co-founders are committed to their vision of open and decentralized AI, seeing their work as a dream job that aligns with their passion for advancing the field collectively, emphasizing that true innovation comes from combining expertise across diverse layers of the technology stack.

Common Questions

Together AI's mission is to build open and independent AI systems. They aim to provide a platform for open-source models and user-owned AI, contrasting with closed platforms from large labs.

Topics

Mentioned in this video

Software & Apps
Llama

A foundational model that generated excitement in the AI community, influencing the development of the RedPajama dataset.

C4

A large dataset from Google, mentioned as an inspiration for the RedPajama dataset.

TGI

Text Generation Inference, a toolkit for deploying large language models, mentioned in the context of machine learning systems expertise.

flash attention

An optimization technique for attention mechanisms in transformers, open-sourced and contributing to better AI models.

Domo

A tool from AI2 that provides a flexible format for quality signals, similar to Together AI's approach with RedPajama V2.

BGE

A Chinese embedding model that previously topped the MTEB chart, mentioned in the context of the rise of Chinese models in this domain.

Terraform

An infrastructure as code tool, mentioned as an example of devops expertise sought by Together AI.

VRM

Potentially referring to models that handle virtual reality models or files, mentioned in the context of machine learning systems expertise.

Vip razor

A collaborative spam filter created by one of the co-founders, mentioned as an early open-source project.

AWS

Amazon Web Services, mentioned for its revenue and as a benchmark for the scale of AI hyperscaler buildouts.

Kubernetes

A container orchestration system, mentioned as an example of the systems expertise sought by Together AI.

CUDA

NVIDIA's parallel computing platform and API, mentioned as an area of expertise sought by Together AI.

Mamba

A state space model architecture that changed the perception of sub-quadratic architectures, highlighting efficiency beyond just long context.

Strap Hyena

A hybrid architecture model developed by Together AI, combining state space models with transformers to achieve high quality.

More from Latent Space

View all 186 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free