Key Moments

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read139 min video
Jul 5, 2024|3,463 views|88|7
Save to Pod
TL;DR

Yi Tay of Reka discusses LLM research, Reka's models, Google Brain experience, and AI trends.

Key Insights

1

Reka Core's success demonstrates that smaller, well-funded labs can compete with giants in LLM development.

2

The transition in AI research has shifted from task-specific fine-tuning to large, general-purpose foundation models.

3

Yi Tay's career path highlights prioritizing impactful and promising research over rigidly planned trends.

4

Compute reliability and efficient orchestration are significant operational challenges in LLM training.

5

Multi-expert (MoE) architectures are a promising direction for balancing performance and computational cost.

6

The debate between open-source and closed-source LLMs is complex, with incentives playing a major role.

JOURNEY FROM GOOGLE BRAIN TO REKA

Yi Tay's illustrious career began at Google Brain, where he co-led PaLM 2 and invented UL2, significantly contributing to models like Flan and the bcore team. His transition to Reka in March 2023, following a $58 million Series A in June 2023, signals a strategic move to build universal, multimodal, and multilingual intelligence agents. Reka's rapid model releases, including Flash, Core, and Edge, highlight their ambitious goals for self-improving AI and model efficiency, underscoring a sharp focus on impactful research.

EVOLUTION OF LLM RESEARCH PARADIGMS

Tay observes a significant shift in machine learning research, moving from task-specific fine-tuning of models like T5 and BERT to the current paradigm of large, general-purpose foundation models. He notes that while the underlying principles of Transformer architecture and research haven't fundamentally changed, the scale of compute and data has dramatically increased. This evolution, accelerated by events like the ChatGPT launch, has redefined AI research goals towards universal intelligence rather than domain-specific optimizations.

STRATEGIC CAREER GROWTH AND RESEARCH PHILOSOPHY

Reflecting on his career, Tay emphasizes a philosophy of optimizing for impact and promising research rather than proactively chasing trends. His involvement in PaLM 2's development stemmed organically from the success of his personal project, UL2. This approach, combined with strong collaborations and an open-minded adaptability to the rapidly shifting field, allowed him to navigate complex research landscapes and contribute to significant breakthroughs.

CHALLENGES IN LLM INFRASTRUCTURE AND TRAINING

Building state-of-the-art LLMs at Reka highlighted significant operational challenges, particularly around compute reliability. Tay describes the frustrating experience of GPU delays and unreliable hardware, which significantly impacted training runs. The decision to use GPUs over TPUs was based on familiarity and existing infrastructure. He stressed that compute providers must offer better risk-sharing models, as brittle infrastructure can devastate startups by wasting precious training time and resources, impacting work-life balance due to the constant anxiety.

ARCHITECTURAL INNOVATIONS AND MODEL DESIGN

Reka's models, such as Reka Core and Flash, incorporate advanced architectural choices, including aspects of the 'Gnome Architecture' characterized by gated linear units (GLU variants like SwiGLU), grouped query attention (GQA), and RoPE embeddings. Tay finds GQA a no-brainer for inference benefits and appreciates RoPE for its extrapolation properties. He also discusses the nuanced benefits of encoder-decoder architectures, noting their intrinsic sparsity that allows for greater parameter efficiency compared to decoder-only models at the same compute budget, especially when dealing with multimodal inputs.

THE MULTIMODAL FRONTIER AND EVALUATION CHALLENGES

The AI field is increasingly trending towards early fusion in multimodal models, integrating different modalities from the outset for deeper understanding, a direction Reka and players like OpenAI (GPT-4o) are pursuing. Tay believes that while late fusion will persist due to practical constraints, early fusion represents the more robust long-term approach. He also notes the critical need for better evaluation benchmarks, especially for long-context models and multimodal capabilities, to guide progress effectively and avoid the contamination plaguing current benchmarks.

SCALING LAWS, EFFICIENCY, AND FUTURE TRAJECTORIES

Tay views Chinchilla's scaling laws as a misunderstood guideline rather than a strict limit, noting that models like LLaMA 3 have significantly surpassed them. He advocates for a holistic view of efficiency, considering not just active parameters or theoretical FLOPs but also practical throughput, inference speed, and serving costs. He is bullish on Mixture-of-Experts (MoE) architectures for their favorable compute-to-parameter ratio, believing they are a key enabler for continued scaling, though the nuances of their impact on capabilities beyond performance metrics remain an active research question.

THE OPEN VERSUS CLOSED-SOURCE DEBATE

Tay draws a distinction between open-weight models released by large labs (like Meta's LLaMA) and grassroots, bottom-up open-source efforts. While not inherently against open source, he observes that many community-driven initiatives often rely on rebranding existing models or quick wins through fine-tuning, which may not lead to fundamental advancements. He suggests that a lack of substantial reward signals for purely derivative open-source work could limit its long-term impact compared to the sustained, resource-intensive development seen in closed-source models.

Common Questions

Yi Tay's transition happened organically as the field evolved. He optimized for impactful and promising areas, collaborating with influential people like Jason Wei. The widespread adoption of GPT-3 and ChatGPT significantly shifted the research meta from task-specific fine-tuning to universal foundation models.

Topics

Mentioned in this video

Software & Apps
Palm 2

Yi Tay was the architecture co-lead on Palm 2 at Google Brain, a significant company-wide effort in large language model development.

Flan

Yi Tay was a core contributor to Flan at Google Brain, a project mainly led by Hieu Pham and Sharan Narang focusing on instruction tuning.

BERT

Mentioned as an example of models that researchers were fine-tuning in late 2019, before the focus shifted entirely to foundational models.

Reka Flash, Reka Core, Reka Edge

Models released by Reka AI, which achieved state-of-the-art results even against larger models from bigger labs, showcasing the team's ability to achieve high performance with efficient cycles.

PaLM

Yi Tay contributed to Palm 1 and was a co-lead for the modeling workstream of Palm 2.

Mamba

Hypothetical example of a Transformer alternative that may show good performance at small scales but whose implications for larger models are unknown.

UL2

Yi Tay is the inventor of UL2, which he started as a personal project during a break and became the largest encoder-decoder model released by Google at the time.

ChatGPT

The launch of ChatGPT in November 2022 drastically changed the AI research landscape, making previous task-specific work largely obsolete and accelerating the focus on general-purpose models.

Kubernetes

Used by Reka AI for some orchestration tasks, but Yi Tay notes that generalized orchestration tools for ML experimentation are still lacking in open source.

GPT-3

GPT-3's release marked the emergence of few-shot and in-context learning, changing the paradigm of large language model research.

Phi-1, Phi-2, Phi-3

A series of small language models from Microsoft Research that aims to achieve strong performance with significantly less data, though Yi Tay questions whether they can truly 'cheat' scaling laws.

GPT-4

A model whose architecture is speculated to have pluggable experts, particularly for vision, suggesting a modular approach to multimodal capabilities.

arXiv

A pre-print server where Yi Tay used to frequently browse new papers at 9:30 AM Singapore time to stay updated on research.

T5

An early general-purpose model from Google Brain, mentioned by Yi Tay as an example of the shift towards universal foundation models, even before the general public caught up.

HumanEval

A coding benchmark noted as being saturated and contaminated, similar to GSM8K, making it less effective for truly evaluating new model capabilities.

More from Latent Space

View all 169 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free