Key Moments

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read66 min video
Apr 11, 2024|1,248 views|29|1
Save to Pod
TL;DR

Elicit uses AI to enhance research, focusing on systematic literature reviews and advanced reasoning tools with a new notebook interface.

Key Insights

1

Elicit evolved from a research lab (Odd) to a public benefit corporation (PBC) focused on building practical AI tools for reasoning and decision-making.

2

The product automates complex research workflows, particularly systematic literature reviews, enhancing productivity for researchers.

3

Elicit's approach emphasizes supervising the AI process, not just outcomes, by breaking down tasks and training AI on each step.

4

The introduction of 'notebooks' aims to generalize Elicit's capabilities across diverse research workflows, inspired by computational notebooks.

5

Managing AI model costs and performance is a key focus, with a hybrid approach using both closed and open-source models.

6

Future development includes enhancing AI's reasoning capabilities, improving evaluation interfaces, and potentially making models more autonomous.

JOURNEY FROM NONPROFIT RESEARCH TO PRODUCT-FOCUSED PBC

Elicit's journey began with Andreas Stuhlmüller's early interest in AI and development of programming languages for AI. Initially, Odd, a nonprofit research lab, was founded to explore reasoning in machines. Jungwon Byun joined later, bringing product-oriented vision, particularly inspired by mental health applications of AI. This led to the pivot towards Elicit, a public benefit corporation, aiming to build tools that help humanity direct AI's optimization potential towards impactful goals like sound reasoning and decision-making.

AUTOMATING THE LITERATURE REVIEW WORKFLOW

A primary focus for Elicit is automating the laborious process of systematic literature reviews, a human state-of-the-art method for summarizing scientific research. This typically involves large teams working for over a year to filter, extract, and synthesize information from thousands of papers. Elicit aims to expedite this by leveraging AI to discover, process, and analyze documents, transforming researchers into 'data scientists of text' and making robust summaries of scientific knowledge more accessible.

EMPHASIZING PROCESS SUPERVISION AND TASK DECOMPOSITION

Elicit's core philosophy centers on supervising the AI process rather than just its outcomes. This involves meticulously breaking down complex expert tasks into granular steps and training AI to perform each step robustly. This approach, pioneered through experiments simulating human evaluation of AI, makes troubleshooting and debugging much easier. It ensures that AI systems are not simply thrown data and trained blindly but are guided through a structured, understandable process.

THE EVOLUTION OF ELICIT'S PRODUCT INCLUDING NOTEBOOKS

Elicit has evolved significantly, starting as a forecasting assistant and then becoming a general research assistant. The introduction of 'notebooks,' inspired by computational notebooks like Jupyter, represents a major step towards a more generalized and composable research platform. These notebooks allow users to iteratively analyze evidence, decompose PDFs into claims and insights, and remix information, offering a flexible environment for complex reasoning and analysis beyond simple chat interactions.

MANAGING MODEL COSTS AND PERFORMANCE HYBRID STRATEGY

Elicit employs a hybrid strategy for model selection, utilizing both closed-source and open-source models, roughly splitting their operational budget. Closed-source models are used for tasks requiring higher intelligence where open-source alternatives are not yet sufficient. Managing costs involves tiered pricing based on complexity and accuracy, and a credit system. Ongoing performance is monitored through internal benchmarks that reflect user query distributions, focusing on robustness, latency, and hallucination rates.

FUTURE OF RESEARCH AND AI-ASSISTED DISCOVERY

The vision for Elicit extends beyond summarizing existing knowledge to driving new discoveries. As AI models improve, Elicit aims to empower them to conduct research more systematically and autonomously. This requires enhancing world models within AIs to understand underlying structures of different domains and make novel connections. The emphasis on transparency, process explicitness, and ease of evaluation is crucial, ensuring users can audit and trust AI even as its capabilities surpass human researchers.

THE ROLE OF LONGER CONTEXT WINDOWS AND RAG

Longer context windows in AI models are highly relevant for Elicit, particularly for advanced ranking and re-ranking of search results. While Retrieval-Augmented Generation (RAG) remains essential for handling vast datasets, longer context windows enable more powerful list-wise re-ranking and analysis of diverse results. This allows Elicit to offer higher-quality results for power users willing to invest more compute, though debugging becomes potentially more complex compared to the step-by-step RAG pipeline.

ACCURACY AND GROUNDING: MANAGING MODEL UNCERTAINTY

Ensuring user trust hinges on accurate and grounded AI outputs. Elicit addresses this by directly citing supporting text from papers for every generation and flagging model uncertainty. When models express low confidence, users are prompted to check those specific sections. This explicit grounding and uncertainty flagging, sometimes achieved by using different models for response generation and uncertainty estimation, helps users validate AI outputs effectively, even as models become more capable.

THE VALUE OF CUSTOMIZATION AND EXPLORING NEW USE CASES

A key strength of Elicit is its powerful 'add column' feature, allowing users to extract custom data points from papers with tailored instructions. This flexibility enables workflows beyond simple fact extraction, such as classifying methodologies or interpreting complex diagnostic test results. This adaptability has led to surprising use cases, like aiding physicians in interpreting genomic tests, demonstrating Elicit's potential to democratize access to complex scientific information.

THE DIFFERENCE BETWEEN NOTEBOOKS AND CHATBOTS FOR RESEARCH

Unlike chat interfaces, which are suited for quick iterative interactions, Elicit's notebooks are designed to define and execute processes. This allows users to prototype analyses on small datasets and then scale them to much larger ones. The notebook interface enables users to build a structured workflow—like a data analysis pipeline—and then easily transfer that process for repeated or broader application, supporting a more programmatic and scalable approach to research than simple conversational AI.

Common Questions

Elicit is an AI research assistant designed to help researchers by automating literature review, document analysis, and concept discovery. It aims to make understanding existing knowledge more efficient and transparent.

Topics

Mentioned in this video

Software & Apps
GPT-2

An early generative language model used by Elicit in its initial stages of research and product development.

GPT-3

A significant language model release that prompted Elicit to shift focus towards building a more general research assistant.

QBasic

A programming language mentioned as an early influence on Andreas Stuhlmüller's interest in coding and AI, encountered in library books.

Airflow

An open-source platform to programmatically author, schedule, and monitor workflows, explored by Elicit.

GPT-4

The fourth generation of OpenAI's GPT models, which enabled new features for Elicit, particularly in processing tabular data.

GPT-3.5

A version of OpenAI's GPT models, mentioned as potentially being used for uncertainty estimates within Elicit's system.

T5

A text-to-text transfer transformer model, mentioned as one of the models Elicit used in its early development and continues to use.

Elicit

An AI research assistant platform focused on literature summarization, document analysis, and enabling researchers to understand known information and discover insights.

Deepnote

A collaborative data science platform with computational notebooks, mentioned as an inspiration for Elicit's notebook features.

adept

A company working on AI tools, mentioned in the context of multimodality where David Lan (CEO) provided insights.

BERT

A transformer-based language model, mentioned in comparison to models like T5, used in early stages of NLP.

Llama

A large language model developed by Meta, mentioned as potentially being used for certain tasks or uncertainty estimates within Elicit.

Prefect

A modern workflow orchestration system, looked into by Elicit for its task management capabilities.

Colab

Google's cloud-based Jupyter notebook environment, cited as an example of computational notebooks that inspired Elicit's feature development.

Claude

A large language model from Anthropic, noted for its strong performance and balanced cost-accuracy trade-off, particularly Claude Haiku.

Daxter

An orchestration framework, explored by Elicit for managing complex tasks.

Claude Haiku

A specific version of Anthropic's Claude model, highlighted for its cost-effectiveness and performance balance for summarization tasks.

More from Latent Space

View all 175 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free