Stanford CS547 HCI Seminar | Winter 2026 | Does GenAI Work in Education?
Key Moments
GenAI can aid education when guided by knowledge engineering and cognitively aligned interfaces.
Key Insights
Knowledge engineering is essential for AI-powered education: detailing cognitive requirements enables higher cognitive fidelity, better feedback, and targeted practice.
Cognitively aligned tutors with high-fidelity feedback outperform generic guidance: AI-mediated feedback in economics writing raised revised draft quality (effect size ~0.5) and correlated with TA adoption of AI suggestions.
AI judges can match expert rubrics with high accuracy and improve consistency, but learning gains depend on dosage and instructional design.
Note-taking AI variants reveal that intermediate, nicely scaffolded AI output best supports encoding; raw automation or minimal summaries hinder deeper learning unless revising with notes is allowed.
A two-loop model for teacher-AI collaboration is proposed: inner loop (tool support via knowledge engineering and alignment) and outer loop (AI as teaching partners to reallocate instructional resources and social capital).
Generalizability hinges on process over content: the design principles (knowledge engineering, cognitive fidelity, and interface alignment) are transferable, even as domain-specific results vary and heterogeneity among learners arises.
INTRODUCTION: CONTEXTS, CHALLENGES, AND A NEW QUESTION
The talk opens by framing a broad landscape: learning occurs across colleges, K-12, homes, and workplaces, yet systemic barriers persist in giving learners the support they need at scale. The speaker cites real-world pressures—large gateway courses with thousands of students, substantial demand for training across jobs, and declines in middle school reading and math—to motivate the central aim: how can technologies, especially GenAI, support human skill acquisition without simply adding noise or superficial aid? The discussion surveys mixed evidence: students widely use GenAI for problem solving and output creation, markets churn with AI-powered tutoring tools, and results differ across studies. Some randomized trials show gains from carefully designed AI tutoring (Harvard physics, Stanford K-12 math) while other work finds potential drawbacks (loss of learning when AI access is removed, reduced brain activity with AI essay helpers, and quality issues in AI-generated hints). The speaker then reframes the overarching question: given AI tools are increasingly used, how can GenAI work in education—through two main stories—knowledge engineering and cognitively aligned interfaces—and what does that imply for a scalable, responsible educational future?
STORY 1: KNOWLEDGE ENGINEERING AND COGNITIVE FIDELITY
Knowledge engineering here means explicitly mapping the cognitive requirements for a task to enable high cognitive fidelity in AI tutors. The talk traces this idea back to cognitive tutoring and Bloom’s two sigma challenge: one-on-one tutoring yields two SDs higher achievement than typical classroom instruction, yet scaling such tutoring is infeasible. Cognitive tutors addressed this through a cognitive model based on production rules, enabling an AI agent to mimic expert thinking, provide feedback, and offer targeted practice. A key concept is cognitive fidelity—feedback that reflects how experts decompose a task and anticipate common misconceptions. The example of fraction addition illustrates how different feedback can be: a lower fidelity rubric might tell a student to find a common denominator, while a higher fidelity version uses a multi-step reasoning path (identify decision makers, compare outcomes, explain the efficient result). The section also previews an application in which knowledge engineering guides the design of AI feedback for writing in an introductory economics course, highlighting how rich rubrics and iterative testing with instructors improve system performance.
STORY 1: FEEDBACK WRITER—DESIGN, TRIALS, AND RESULTS
The Econ 101 feedback writer project is described in depth. The course includes about 360 students per semester, with multiple writing assignments that involve market failure analysis and solution proposals. A 360-student randomized trial compares an AI-supported TA workflow (the feedback writer) against a baseline TA workflow with stock rubrics and human feedback. Rubrics were iteratively refined with instructors to maximize cognitive fidelity, and feedback messages were designed as hints rather than direct answers to preserve learning opportunities. The study measures revised draft quality and post-revision learning through both straightforward rubric-based AI judgments and a post-test. Results show that AI-mediated feedback produced significantly higher quality revised drafts (effect size ~0.5, moving a student from 50th to roughly 70th percentile relative to human-only feedback). The AI judge’s accuracy against expert instructors was around 85–87% across several models (GPT-4, GPT-5, Gemini). Importantly, the study observed that the presence of AI suggestions correlates with greater revision quality among TAs, underscoring how AI can bolster expert practice. The researchers also discuss limitations: a post-test did not show a significant gain, which they attribute to dosage (students get limited opportunities to practice the skill) and the need for longer or more varied interventions. A follow-up study is outlined, exploring an AI-only first-draft feedback loop within 24 hours, plus a short TA office hour to maintain human engagement and relationships.
STORY 2: NOTE COPILOT—ALIGNING NOTE-TAKING WITH ENCODING
This section also highlights that higher engagement and richer interactions with the AI (e.g., more edits, dragging AI-generated content into the personal notes) correspond with better learning signals. The authors argue that the Note Copilot design embodies an encoding-centric approach: rather than handing learners polished notes, the system reinforces the learner’s role in transforming, structuring, and integrating the material into their own mental models. The findings advocate for a balanced triad—AI outputs support, but do not replace, the learner’s own processing. The authors discuss limitations and future directions, including exploring how encoding strategies interact with different domains, and whether adaptive variants can tailor AI support to individual note-taking styles. Together, the note-taking study reinforces the central claim: cognitively aligned interfaces, when thoughtfully integrated with knowledge engineering, can shape not just what students learn but how they learn it.
IMPLICATIONS: TWO LOOPS FOR TEACHER-AI COLLABORATION AND FUTURE PATHWAYS
Building on the two case studies, the talk articulates a broader framework for AI in education: (1) inner loop—tool support for teachers through knowledge engineering and cognitively aligned interfaces, and (2) outer loop—viewing AI as a teaching partner that can redistribute instructional resources and social capital to deepen student engagement. The inner loop emphasizes extracting expert mental models in ways that are transferable to scalable formats such as AI tutoring, automated feedback, and intelligent practice. The outer loop envisions teachers spending more time on relationship-building, mentorship, and higher-level guidance that AI cannot replace, thereby alleviating cognitive load on instructors while enhancing student motivation and alignment with real-world problems. The speaker also discusses ongoing challenges: how to efficiently extract expertise (through visualization, example selection, and simulated student solutions), how to represent knowledge, and how to test the generalizability of these methods across domains. Finally, the talk raises the AI assistance dilemma—guardrails are needed to prevent overreliance and to preserve deep cognitive engagement—alongside strategies to support self-regulation and social learning, such as fostering collaborative contexts and motivational support through human interactions. The closing message is cautiously optimistic: with disciplined knowledge engineering and thoughtfully designed interfaces, GenAI can scale personalized support and foster meaningful teacher-student relationships.
Mentioned in This Episode
●Software & Apps
●Tools
●Companies
●Studies Cited
●People Referenced
AI-assisted Feedback: Quick Reference for Instructors
Practical takeaways from this episode
Do This
Avoid This
Selected Quantitative Findings
Data extracted from this episode
| Metric | AI Condition | Baseline Condition | Notes |
|---|---|---|---|
| Revised draft quality | AI-mediated feedback | Human-only feedback | Significant improvement; effect size ~0.5 (roughly from 50th to 70th percentile) |
| AI judge accuracy | AI judge ~85% accuracy (GPT-4, GPT-5, Gemini 3 Pro) | Expert instructor | AI matches expert-level judgments with ~85% accuracy |
| Post-test results | AI-first drafts (optional post-test) | Baseline | No significant difference; trend favoring AI not statistically significant |
| TA adoption of AI suggestions | Higher AI-suggest usage | Lower AI-suggest usage | Positive correlation with revision quality |
Common Questions
Bloom's two sigma challenge shows that one-on-one tutoring can yield about two standard deviations in achievement over conventional classroom instruction, but delivering one-on-one tutoring to every student is impractical. This motivates AI-based personalized tutoring as a scalable alternative.
Topics
Mentioned in this video
Randomized trial showing a carefully designed AI tutor leads to better learning than in-class active learning in a college physics course
PhD student collaborator on the Feedback Writer project
Collaborator on the Note Copilot research
Study showing AI access without guardrails can improve performance in high school math, but it reduces learning when AI access is taken away
Note-taking system with three variants: Automated AI, Intermediate AI, Minimal AI
Study showing that giving human tutors real-time AI suggestions can improve students' learning in K-12 math
System for TAs to provide feedback on students' essays, guided by rubrics and AI suggestions
Collaborator on the Note Copilot research
AI-based evaluator used to assess essay quality; achieves about 85% accuracy vs. expert instructor
More from Stanford Online
View all 12 summaries
2 minDesign and Control of Haptic Systems: The Challenges of Robotics
2 minBiomechanics and Mechanobiology: Understanding How Human Beings Work
2 minWhat Are Biomechanics and Mechanobiology? Associate Professor Marc Levenston Explains
1 minWhat Is A Haptic Device? Professor Allison Okamura Explains.
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free