Key Moments

Heroes of NLP: Oren Etzioni

DeepLearning.AIDeepLearning.AI
Science & Technology6 min read35 min video
Oct 13, 2020|3,206 views|95|4
Save to Pod
TL;DR

Oren Etzioni discusses NLP's evolution, Semantic Scholar, ethical AI, and advice for aspiring professionals.

Key Insights

1

Deep learning's unexpected commercial success stemmed from fundamental AI research questions.

2

Open Information Extraction aimed to map vast web data into comprehensive knowledge bases.

3

Semantic Scholar uses AI to help researchers navigate the overwhelming volume of scientific publications.

4

The COVID-19 pandemic highlighted the importance of AI in rapidly processing and disseminating critical research.

5

Successful AI startups require meticulous planning around data acquisition and labeling strategies.

6

Self-labeling data in sequential processes, like language modeling, is crucial for training large models.

7

The growth of AI models likely to continue, balanced by a focus on efficiency and 'Green AI'.

8

Career paths in AI/NLP can be optimized for compensation/excitement (industry) or freedom/fundamental questions (academia).

9

Regulation should focus on specific AI applications rather than basic research, emphasizing auditing over explanations.

10

Aspiring NLP professionals should build strong fundamentals, leverage online resources, and gain practical experience.

ORIGINS AND EVOLUTION OF AI INTEREST

Oren Etzioni's fascination with Artificial Intelligence began in high school after reading Douglas Hofstadter's 'Gödel, Escher, Bach.' This sparked a profound interest in the fundamental questions surrounding intelligence and the creation of intelligent machines. His early exploration involved learning Lisp, an ancient programming language, which he found 'endlessly fun.' This foundational interest continued into college, where he focused on computer science as the pathway to AI. He notes how the pursuit of these fundamental intellectual questions has unexpectedly led to powerful technologies and commercial success, particularly with the rise of deep learning.

PIONEERING OPEN INFORMATION EXTRACTION

Etzioni was a pioneer in 'open information extraction' (Open IE) from the web. The core idea was to map unstructured sentences from the web into structured database tuples, moving beyond narrowly defined event extraction like M&A or terrorist events. His motto, 'no sentence left behind,' aimed to extract information from any sentence to build a comprehensive knowledge base. This required generalizing techniques beyond traditional supervised learning by developing unsupervised methods that could learn from the vast, diverse language used online. He observed linguistic invariants and regularities in how relationships are expressed, providing strong signals for learning algorithms.

SEMANTIC SCHOLAR: NAVIGATING SCIENTIFIC LITERATURE

As CEO of the Allen Institute for AI (AI2), Etzioni leads initiatives focused on 'AI for the common good.' One significant project is Semantic Scholar, designed to combat the 'Moore's Law of scientific publication'—the rapid exponential growth of research papers. Semantic Scholar uses AI to help scientists and the public access relevant papers more efficiently. It offers features like 'extreme summaries' (TLDRs) for papers and uses computer vision to extract figures and tables, enabling users to quickly assess a paper's relevance. This saves researchers valuable time in an increasingly deluge of information.

AI'S ROLE IN THE COVID-19 PANDEMIC

Semantic Scholar played a crucial role during the COVID-19 pandemic. In early 2020, the White House reached out to AI2 due to their capabilities in processing large collections of papers. AI2, in collaboration with organizations like the Chan Zuckerberg Initiative and Microsoft, rapidly created and maintains a corpus of over 200,000 COVID-19 related papers, making it machine-readable. This open dataset, known as CORD-19, enabled AI systems to accelerate research and answer critical questions about the virus far more quickly than traditional methods allowed.

DATA AND LABELING IN AI STARTUPS

Etzioni emphasizes the critical importance of data for AI-based startups. He highlights the 'dirty little secret' that success often hinges not just on big data, but on plentiful labels. Entrepreneurs must carefully consider where their data will come from and how it will be labeled. He uses his experience with Farecast, a successful airfare prediction company, as an example. Farecast generated 'a trillion labeled data points' by leveraging the sequential nature of temporal data; as flight prices changed, predictions were automatically validated over time, creating a self-labeling dataset without manual intervention.

SELF-LABELING DATA AND THE RISE OF LARGE MODELS

The concept of self-labeling data extends to modern NLP, particularly with large language models like BERT and GPT-3. These models effectively 'label themselves' by predicting masked words within sentences. The inherent sequential nature of language allows for a form of self-supervision: if a model predicts a word, it can compare its prediction to the actual word in the corpus. This capability has fueled the dramatic success and growth of these models, enabling them to learn from vast amounts of text with less reliance on explicit human labeling, a significant breakthrough for the field.

THE FUTURE TREND OF MODEL AND DATA SCALING

While acknowledging his past predictions on model size plateauing were incorrect, Etzioni believes the trend toward larger models and datasets will continue due to the persistent hunger for performance. However, he anticipates a dual trajectory: continued scaling alongside a growing focus on optimization. This includes developing more data-efficient strategies and computational efficiencies. He draws an analogy to chess, which evolved from requiring supercomputers to running on laptops via better algorithms, alongside simultaneous scaling up in complexity to games like Go, suggesting both brute-force scaling and refinement will characterize AI's future.

EMERGING FOCUS ON 'GREEN AI' AND ACCESSIBILITY

Recognizing the significant computational cost and barrier to entry for massive models, AI2 is exploring 'Green AI.' This approach emphasizes efficiency and accessibility, aiming to achieve state-of-the-art results with fewer resources. The goal is to enable researchers to work effectively with smaller budgets and datasets, fostering broader participation. Concepts like 'NLP in a Box' explore delivering powerful NLP capabilities on devices like laptops or phones, addressing privacy concerns and intermittent connectivity-related challenges for edge computing.

ACADEMIA VS. INDUSTRY CAREER PATHS

Etzioni likens choosing between academia and industry in AI to optimizing for different goals. The private sector, particularly startups, often appeals to those optimizing for compensation and adrenaline-fueled excitement, similar to a car race or poker game. Conversely, academia offers maximum freedom to pursue fundamental intellectual questions deeply and on one's own terms, without external pressures. He has experienced both, valuing the deep, uninterrupted intellectual exploration in academia and the exhilarating challenge of building and succeeding with a team in the commercial sector.

REGULATING AI: APPLICATIONS OVER RESEARCH

Discussing AI regulation, Etzioni strongly advocates for regulating specific applications rather than basic underlying research. He warns against legislating values into technology. Bias in NLP, for instance, is a serious issue, but the focus should be on how that bias manifests in applications like resume scanning software, which can be legally challenged and audited. Regulating the applications ensures accountability for their problematic impacts, while allowing fundamental research to flourish and ensuring that technological advancements are not stifled by overly broad regulations.

AUDITING AND TRANSPARENCY IN AI

Etzioni highlights the critical difference between demanding explanations from AI models and enabling auditing. Deep learning models, with their vast parameters, may struggle to provide truly understandable explanations. He suggests that mandating an 'right to audit' is more practical and robust. This allows regulatory agencies or third parties access to audit model behavior for bias or fairness. This marketplace of ideas, involving various stakeholders like journalists and nonprofits, can provide checks and balances, fostering greater transparency and accountability than potentially inscrutable or misleading explanations.

ADVICE FOR ASPIRING NLP PROFESSIONALS

For those looking to enter or grow in NLP, Etzioni stresses the importance of mastering fundamental skills in statistics, computer science, and machine learning. He recommends leveraging cost-efficient and accessible online courses, including DeepLearning.AI's NLP specialization. Crucially, he emphasizes that theoretical knowledge must be complemented by hands-on practice. Taking on real problems with actual datasets is essential to truly understand concepts, troubleshoot challenges, and potentially uncover new ideas or inventions, solidifying learning through direct experience.

Building a Career in NLP: Advice from Oren Etzioni

Practical takeaways from this episode

Do This

Master the fundamentals: statistics, computer science, and machine learning.
Utilize accessible online courses for cost-efficient learning.
Gain practical experience by taking on real problems with datasets.
Focus on efficiency and cost in AI development (Green AI).
When choosing between academia and industry, optimize for what matters most: freedom for fundamental research or compensation/adrenaline for startups.
Regulate AI applications, not basic research.
Advocate for auditing mechanisms over potentially misleading explanations for AI models.

Avoid This

Don't solely rely on 'flavor of the month' trends in AI; build a strong foundation.
Don't underestimate the importance of data sources and labeling for AI startups.
Avoid rushing to build massive models without considering efficiency and accessibility.
Do not try to legislate values into technology; regulate specific applications where bias is problematic.
Don't expect regulators to provide perfect, understandable explanations from complex deep learning models; focus on audit rights instead.

Common Questions

Oren Etzioni became fascinated with AI in high school after reading Douglas Hofstadter's 'Gödel, Escher, Bach', which raised fundamental questions about intelligence. He then began studying Lisp before college and pursued computer science at Harvard.

Topics

Mentioned in this video

More from DeepLearningAI

View all 65 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free