Heroes of NLP: Kathleen McKeown
Key Moments
NLP pioneer Kathleen McKeown discusses her journey from literature to AI, innovations in text summarization, and the future of the field.
Key Insights
Started in NLP with a background in comparative literature, bridging language and math interests.
Early career in NLP involved self-directed study and overcoming imposter syndrome.
Advocates for pursuing novel research problems that address real-world needs, rather than just optimizing benchmarks.
Current research focuses on abstractive summarization, particularly of novels, and analyzing language from diverse communities.
Highlights the importance of interdisciplinary work, drawing perspectives from fields like linguistics, philosophy, and social work.
Sees bias in data and understanding paralinguistic meaning (emotion, intention) as crucial future directions for NLP.
AN UNUSUAL PATH INTO NLP
Kathleen McKeown's journey into Natural Language Processing (NLP) began unconventionally. Initially majoring in comparative literature while maintaining a strong interest in mathematics, she discovered computational linguistics after graduation. This field offered a compelling way to integrate her passion for language with her quantitative skills, leading her to pursue graduate studies in NLP. Her early academic exploration was largely solitary, relying on library research and tracing references, a testament to her self-driven pursuit of knowledge in a nascent field.
OVERCOMING SELF-DOUBT AND FINDING COMMUNITY
Entering graduate school in a new field was daunting, and McKeown experienced significant imposter syndrome, doubting her preparedness. She emphasizes that this feeling is common and can be overcome with time and continued engagement. For aspiring NLP researchers facing similar isolation, she suggests actively reaching out to peers and mentors. The online environment today offers numerous avenues for connection, such as reading groups, online courses, and video discussions, which can mitigate feelings of loneliness and provide valuable insights.
PIONEERING TEXT SUMMARIZATION RESEARCH
Summarization has been a central theme in McKeown's research. Her work has explored summarization across various genres, including novels and personal narratives. A particularly challenging area is the summarization of novel chapters, which involves processing lengthy texts and handling extensive paraphrasing from 19th-century language to modern English. This type of abstractive summarization, where the output significantly rephrases the input, is a departure from most current news summarization research, which often relies on more extractive methods due to data availability.
THE VALUE OF INTERDISCIPLINARY APPROACHES
McKeown strongly advocates for interdisciplinary research, finding it to be her most enjoyable and perspective-enriching form of work. Collaborating with experts in other fields, such as journalism, social work, and linguistics, brings fresh viewpoints and pushes researchers beyond narrow technical confines. This approach has led her to explore novel problems, distinguishing her work from the more common focus on standardized benchmarks within the NLP community, thereby fostering innovation and addressing a wider array of real-world applications.
SELECTING MEANINGFUL RESEARCH PROBLEMS
McKeown advises researchers to select tasks that truly matter and have a tangible impact, rather than solely focusing on optimizing performance on existing benchmarks, like those for news summarization. She points out that some popular research directions are driven by data availability and competitive leaderboards, but may not address pressing human needs. Pursuing uncharted territories, though potentially more challenging in terms of evaluation and acceptance, allows for groundbreaking contributions and the development of genuinely useful applications.
ADDRESSING BIAS AND DIVERSE LANGUAGE USE
Current research efforts are directed toward analyzing language from diverse communities, such as the Black community, to understand how emotions are expressed in African-American Vernacular English (AAVE) and how it differs from Standard American English. This work aims to develop less biased algorithms and gain insights into the impact of trauma. By examining language used in response to significant societal events like Black Lives Matter and COVID-19, researchers hope to build bridges between communities and contribute to a more nuanced understanding of human experience.
EMERGING DIRECTIONS AND TECHNOLOGICAL ADVANCEMENTS
McKeown identifies several exciting emerging areas in NLP. These include highly abstractive summarization with significant paraphrasing, analyzing language from diverse communities, tackling data bias, and understanding paralinguistic meaning, such as emotion and intention. She also highlights the importance of event understanding and tracking. Reflecting on past technological milestones, she recalls early work on word choice in text generation and the Newsblaster project for event-based summarization, noting that control and coherence remain critical challenges even with modern deep learning methods.
EVOLUTION OF THE NLP FIELD OVER DECADES
The field of NLP has evolved significantly since McKeown began her career. Initially marked by strong interdisciplinary collaboration, drawing heavily from linguistics, philosophy, and psychology, the field has seen a shift towards deep learning and neural networks. While these advancements have brought dramatic improvements in accuracy, McKeown hopes to see a resurgence of interdisciplinary work and a greater focus on analyzing data and output rather than just numerical metrics. She believes there are still many exciting and impactful research directions to explore.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Organizations
●Books
●Concepts
●People Referenced
Common Questions
Kathleen McKeown discovered computational linguistics while working as a programmer after college. Finding programming boring, she explored this interdisciplinary field, which combined her interests in language and math, and pursued it for graduate studies.
Topics
Mentioned in this video
A major storm that impacted New York, during which Kathleen McKeown's students developed systems for automatically generating disaster updates.
A system developed by Kathleen McKeown's group that identified events in the news and produced summaries, serving as a testbed for abstractive summarization research.
A field that combines linguistics with computer science, which Kathleen McKeown discovered and decided to pursue for graduate studies.
The field of study focused on the interaction between computers and human language, a primary area of research for Kathleen McKeown.
A dialect of English spoken primarily by Black people in the United States, which Kathleen McKeown's team is working to understand in relation to emotion expression in NLP.
Professor of Computer Science at Columbia University, founding director of the Institute for Data Sciences and Engineering, Amazon Scholar, known for work on text summarization and NLP.
An early paper by Kathleen McKeown on text generation, focusing on how to control word choice based on various linguistic constraints.
More from DeepLearningAI
View all 65 summaries
1 minThe #1 Skill Employers Want in 2026
1 minThe truth about tech layoffs and AI..
2 minBuild and Train an LLM with JAX
1 minWhat should you learn next? #AI #deeplearning
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free