How did Elicit evolve from a research lab to a product company?

Elicit began as a nonprofit research lab called Ord, focusing on AI for reasoning. As the technology matured, particularly with models like GPT-3, the team shifted to building a product to directly address researchers' needs, especially for literature review.

What was the initial motivation behind Elicit?

The founders were motivated by the potential of AI to optimize complex reasoning and decision-making processes. They wanted to ensure that powerful AI technologies could be directed towards impactful ends, like scientific discovery, and be trustworthy and aligned with human values.

How does Elicit differ from a standard chatbot interface?

Unlike chatbots designed for quick back-and-forth, Elicit's notebook interface allows users to define and execute multi-step processes. This enables users to prototype analysis on small datasets and then scale it to larger ones, making research workflows more robust and reproducible.

What is the role of 'Constitutional AI' in Elicit?

Constitutional AI, implemented by Elicit, helps create better AI summaries that are faithful to the source text. It allows models to be trained on a defined 'constitution' of desired summary attributes, reducing hallucinations and improving accuracy.

How does Elicit manage the cost of using large language models?

Elicit uses a credit-based pricing system that allows users to choose their desired level of accuracy and cost. They also optimize model usage, employing open-source models for simpler tasks and more powerful, costly models for complex reasoning and high-accuracy requirements.

What are the key features of Elicit's 'notebook' interface?

Elicit's notebooks are inspired by computational notebooks like Jupyter. They enable researchers to break down complex tasks, apply language model operations, define and extend workflows, and prototype analyses that can later be scaled up, moving beyond simple chat interactions.

How does Elicit handle potential 'needle in a haystack' problems with long context models?

Elicit uses a staged retrieval pipeline, starting with semantic search and then employing larger models for listwise re-ranking to identify relevant information. Long context windows are valuable for this where models can see multiple items comparatively, though RAG is still necessary at scale.

What is the difference between a notebook and an AI agent from Elicit's perspective?

In Elicit's notebooks, the human is initially the agent, defining action steps. The progression involves using language models to predict these actions, eventually leading to autonomous agents executing tasks that were previously human-driven.

How does Elicit ensure the reliability and trustworthiness of its AI outputs?

Elicit emphasizes transparency and evaluation. For every output, it provides citations to supporting text in the source papers and flags model uncertainty. This allows users to easily check and validate the AI's findings.

What are some underrated features of Elicit?

The ability to add custom columns to extract specific data from papers, even classifying methodologies or creating custom fields based on user instructions, is an underrated feature that unlocks many unique workflows.

How can Elicit help researchers drive more discoveries?

By making research more systematic and unbounded, Elicit can help researchers identify potential interventions, assess evidence, and estimate experimental ROI more efficiently. This systematic approach aims to accelerate scientific breakthroughs.

Key Moments

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Latent Space Podcast

Science & Technology4 min read66 min video

Apr 11, 2024|1,267 views|29|1

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

Elicit uses AI to enhance research, focusing on systematic literature reviews and advanced reasoning tools with a new notebook interface.

Key Insights

Elicit evolved from a research lab (Odd) to a public benefit corporation (PBC) focused on building practical AI tools for reasoning and decision-making.

The product automates complex research workflows, particularly systematic literature reviews, enhancing productivity for researchers.

Elicit's approach emphasizes supervising the AI process, not just outcomes, by breaking down tasks and training AI on each step.

The introduction of 'notebooks' aims to generalize Elicit's capabilities across diverse research workflows, inspired by computational notebooks.

Managing AI model costs and performance is a key focus, with a hybrid approach using both closed and open-source models.

Future development includes enhancing AI's reasoning capabilities, improving evaluation interfaces, and potentially making models more autonomous.

JOURNEY FROM NONPROFIT RESEARCH TO PRODUCT-FOCUSED PBC

Elicit's journey began with Andreas Stuhlmüller's early interest in AI and development of programming languages for AI. Initially, Odd, a nonprofit research lab, was founded to explore reasoning in machines. Jungwon Byun joined later, bringing product-oriented vision, particularly inspired by mental health applications of AI. This led to the pivot towards Elicit, a public benefit corporation, aiming to build tools that help humanity direct AI's optimization potential towards impactful goals like sound reasoning and decision-making.

AUTOMATING THE LITERATURE REVIEW WORKFLOW

A primary focus for Elicit is automating the laborious process of systematic literature reviews, a human state-of-the-art method for summarizing scientific research. This typically involves large teams working for over a year to filter, extract, and synthesize information from thousands of papers. Elicit aims to expedite this by leveraging AI to discover, process, and analyze documents, transforming researchers into 'data scientists of text' and making robust summaries of scientific knowledge more accessible.

EMPHASIZING PROCESS SUPERVISION AND TASK DECOMPOSITION

Elicit's core philosophy centers on supervising the AI process rather than just its outcomes. This involves meticulously breaking down complex expert tasks into granular steps and training AI to perform each step robustly. This approach, pioneered through experiments simulating human evaluation of AI, makes troubleshooting and debugging much easier. It ensures that AI systems are not simply thrown data and trained blindly but are guided through a structured, understandable process.

THE EVOLUTION OF ELICIT'S PRODUCT INCLUDING NOTEBOOKS

Elicit has evolved significantly, starting as a forecasting assistant and then becoming a general research assistant. The introduction of 'notebooks,' inspired by computational notebooks like Jupyter, represents a major step towards a more generalized and composable research platform. These notebooks allow users to iteratively analyze evidence, decompose PDFs into claims and insights, and remix information, offering a flexible environment for complex reasoning and analysis beyond simple chat interactions.

MANAGING MODEL COSTS AND PERFORMANCE HYBRID STRATEGY

Elicit employs a hybrid strategy for model selection, utilizing both closed-source and open-source models, roughly splitting their operational budget. Closed-source models are used for tasks requiring higher intelligence where open-source alternatives are not yet sufficient. Managing costs involves tiered pricing based on complexity and accuracy, and a credit system. Ongoing performance is monitored through internal benchmarks that reflect user query distributions, focusing on robustness, latency, and hallucination rates.

FUTURE OF RESEARCH AND AI-ASSISTED DISCOVERY

The vision for Elicit extends beyond summarizing existing knowledge to driving new discoveries. As AI models improve, Elicit aims to empower them to conduct research more systematically and autonomously. This requires enhancing world models within AIs to understand underlying structures of different domains and make novel connections. The emphasis on transparency, process explicitness, and ease of evaluation is crucial, ensuring users can audit and trust AI even as its capabilities surpass human researchers.

THE ROLE OF LONGER CONTEXT WINDOWS AND RAG

Longer context windows in AI models are highly relevant for Elicit, particularly for advanced ranking and re-ranking of search results. While Retrieval-Augmented Generation (RAG) remains essential for handling vast datasets, longer context windows enable more powerful list-wise re-ranking and analysis of diverse results. This allows Elicit to offer higher-quality results for power users willing to invest more compute, though debugging becomes potentially more complex compared to the step-by-step RAG pipeline.

ACCURACY AND GROUNDING: MANAGING MODEL UNCERTAINTY

Ensuring user trust hinges on accurate and grounded AI outputs. Elicit addresses this by directly citing supporting text from papers for every generation and flagging model uncertainty. When models express low confidence, users are prompted to check those specific sections. This explicit grounding and uncertainty flagging, sometimes achieved by using different models for response generation and uncertainty estimation, helps users validate AI outputs effectively, even as models become more capable.

THE VALUE OF CUSTOMIZATION AND EXPLORING NEW USE CASES

A key strength of Elicit is its powerful 'add column' feature, allowing users to extract custom data points from papers with tailored instructions. This flexibility enables workflows beyond simple fact extraction, such as classifying methodologies or interpreting complex diagnostic test results. This adaptability has led to surprising use cases, like aiding physicians in interpreting genomic tests, demonstrating Elicit's potential to democratize access to complex scientific information.

THE DIFFERENCE BETWEEN NOTEBOOKS AND CHATBOTS FOR RESEARCH

Unlike chat interfaces, which are suited for quick iterative interactions, Elicit's notebooks are designed to define and execute processes. This allows users to prototype analyses on small datasets and then scale them to much larger ones. The notebook interface enables users to build a structured workflow—like a data analysis pipeline—and then easily transfer that process for repeated or broader application, supporting a more programmatic and scalable approach to research than simple conversational AI.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

Elicit is an AI research assistant designed to help researchers by automating literature review, document analysis, and concept discovery. It aims to make understanding existing knowledge more efficient and transparent.

Topics

AI & Machine Learning Technology & Innovation Science & Mathematics Large Language Models Research Workflows Natural Language Processing Scientific Discovery Data Extraction AI Research Tools Literature Review Automation AI Ethics And Alignment

Mentioned in this video

Software & Apps

GPT-2

An early generative language model used by Elicit in its initial stages of research and product development.

GPT-3

A significant language model release that prompted Elicit to shift focus towards building a more general research assistant.

QBasic

A programming language mentioned as an early influence on Andreas Stuhlmüller's interest in coding and AI, encountered in library books.

Airflow

An open-source platform to programmatically author, schedule, and monitor workflows, explored by Elicit.

GPT-4

The fourth generation of OpenAI's GPT models, which enabled new features for Elicit, particularly in processing tabular data.

GPT-3.5

A version of OpenAI's GPT models, mentioned as potentially being used for uncertainty estimates within Elicit's system.

A text-to-text transfer transformer model, mentioned as one of the models Elicit used in its early development and continues to use.

Elicit

An AI research assistant platform focused on literature summarization, document analysis, and enabling researchers to understand known information and discover insights.

Deepnote

A collaborative data science platform with computational notebooks, mentioned as an inspiration for Elicit's notebook features.

adept

A company working on AI tools, mentioned in the context of multimodality where David Lan (CEO) provided insights.

BERT

A transformer-based language model, mentioned in comparison to models like T5, used in early stages of NLP.

Llama

A large language model developed by Meta, mentioned as potentially being used for certain tasks or uncertainty estimates within Elicit.

Prefect

A modern workflow orchestration system, looked into by Elicit for its task management capabilities.

Colab

Google's cloud-based Jupyter notebook environment, cited as an example of computational notebooks that inspired Elicit's feature development.

Claude

A large language model from Anthropic, noted for its strong performance and balanced cost-accuracy trade-off, particularly Claude Haiku.

Daxter

An orchestration framework, explored by Elicit for managing complex tasks.

Claude Haiku

A specific version of Anthropic's Claude model, highlighted for its cost-effectiveness and performance balance for summarization tasks.

People

Andreas Stuhlmüller

Co-founder of Elicit, with a background in AI research, programming languages for AI, and a PhD from MIT. He was instrumental in the early research that led to Elicit.

Jungwon Byun

Co-founder and COO of Elicit, with a background in fintech and a long-standing interest in AI applications, particularly in mental health and scalable reasoning tools.

Mike Conover

Previous guest on the podcast, founder of Briwave, an AI research assistant for financial research, used as a point of comparison for Elicit's domain approach.

David Lan

CEO of Adept, who provided insight that multimodality for knowledge work is more about screenshots and PDFs than natural world images.

Maggie Appleton

A friend of the podcast host and an employee at Elicit, mentioned in the context of the company's growth and hiring of expert individuals.

Organizations

Ord

A former nonprofit research lab co-founded by Andreas Stuhlmüller, focusing on AI for reasoning and decision-making, which evolved into Elicit.

MIT

Massachusetts Institute of Technology, where Andreas Stuhlmüller pursued his PhD focusing on programming languages for AI.

Concepts

Constitutional AI

A framework developed by Anthropic for training AI models, which Elicit implemented to create a better summarizer faithful to the source text.

Companies

Briwave

A company founded by Mike Conover, offering an AI research assistant specifically for financial research, discussed in the context of domain-specific AI tools.

Temporal

An open-source orchestration framework, explored by Elicit for managing complex workflows.

Anthropic

AI company that launched the 'Constitutional AI' paper, which Elicit quickly implemented and integrated into their product.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free