What is the Feedback Writer system and how does it work in practice?

Feedback Writer is a TA-facing tool built around detailed rubrics to generate feedback on student essays. It uses AI-suggested feedback that is crafted as hints rather than direct answers, and outcomes were evaluated via a randomized trial with 360 students to measure revised-draft quality and post-test performance.

What is Note Copilot and what are its three variants?

Note Copilot is a note-taking assistant with three variants: Automated AI (structured blocks every 2–3 minutes), Intermediate AI (a real-time summary after each speaking turn), and Minimal AI (verbatim transcripts in real time). All variants allow users to take their own notes and ignore AI output if desired.

Did AI-mediated feedback improve revised drafts compared with human-only feedback?

Yes. Students who received AI-mediated feedback produced significantly higher quality revised drafts than those who received human-only feedback, with an effect size around 0.5, suggesting a meaningful improvement.

What is cognitive fidelity and why is it important for AI tutors?

Cognitive fidelity ensures the AI tutor's reasoning and feedback align with the learner's cognitive processes. A higher fidelity model can provide more personalized and concrete feedback, improving deliberate practice and learning efficiency.

What is the plan for AI-only feedback within 24 hours plus office hours?

The ongoing study tests an AI-only approach to give first-draft feedback within 24 hours, with each TA spending about 10 minutes per essay. Each student would then have a 10-minute office-hour session to discuss the feedback and build interpersonal relationships with the instructional team.

Were there post-test results showing learning gains, and how were they interpreted?

Post-test results showed no significant difference between AI-mediated and baseline conditions, which the researcher attributes to dosage issues—students had only one opportunity to revise, and the design may not capture all learning components.

What does the 'two loops' concept propose for teacher-AI collaboration?

The inner loop involves tool support for teachers through knowledge engineering and cognitive-aligned interfaces, while the outer loop explores AI as teaching partners to redistribute instructional resources and instructor social capital toward deeper student engagement.

Key Moments

Stanford CS547 HCI Seminar | Winter 2026 | Does GenAI Work in Education?

Stanford Online

Education6 min read57 min video

Feb 13, 2026|5,879 views|121|3

Stanford Stanford Online

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

GenAI can aid education when guided by knowledge engineering and cognitively aligned interfaces.

Key Insights

Knowledge engineering is essential for AI-powered education: detailing cognitive requirements enables higher cognitive fidelity, better feedback, and targeted practice.

Cognitively aligned tutors with high-fidelity feedback outperform generic guidance: AI-mediated feedback in economics writing raised revised draft quality (effect size ~0.5) and correlated with TA adoption of AI suggestions.

AI judges can match expert rubrics with high accuracy and improve consistency, but learning gains depend on dosage and instructional design.

Note-taking AI variants reveal that intermediate, nicely scaffolded AI output best supports encoding; raw automation or minimal summaries hinder deeper learning unless revising with notes is allowed.

A two-loop model for teacher-AI collaboration is proposed: inner loop (tool support via knowledge engineering and alignment) and outer loop (AI as teaching partners to reallocate instructional resources and social capital).

Generalizability hinges on process over content: the design principles (knowledge engineering, cognitive fidelity, and interface alignment) are transferable, even as domain-specific results vary and heterogeneity among learners arises.

INTRODUCTION: CONTEXTS, CHALLENGES, AND A NEW QUESTION

The talk opens by framing a broad landscape: learning occurs across colleges, K-12, homes, and workplaces, yet systemic barriers persist in giving learners the support they need at scale. The speaker cites real-world pressures—large gateway courses with thousands of students, substantial demand for training across jobs, and declines in middle school reading and math—to motivate the central aim: how can technologies, especially GenAI, support human skill acquisition without simply adding noise or superficial aid? The discussion surveys mixed evidence: students widely use GenAI for problem solving and output creation, markets churn with AI-powered tutoring tools, and results differ across studies. Some randomized trials show gains from carefully designed AI tutoring (Harvard physics, Stanford K-12 math) while other work finds potential drawbacks (loss of learning when AI access is removed, reduced brain activity with AI essay helpers, and quality issues in AI-generated hints). The speaker then reframes the overarching question: given AI tools are increasingly used, how can GenAI work in education—through two main stories—knowledge engineering and cognitively aligned interfaces—and what does that imply for a scalable, responsible educational future?

STORY 1: KNOWLEDGE ENGINEERING AND COGNITIVE FIDELITY

Knowledge engineering here means explicitly mapping the cognitive requirements for a task to enable high cognitive fidelity in AI tutors. The talk traces this idea back to cognitive tutoring and Bloom’s two sigma challenge: one-on-one tutoring yields two SDs higher achievement than typical classroom instruction, yet scaling such tutoring is infeasible. Cognitive tutors addressed this through a cognitive model based on production rules, enabling an AI agent to mimic expert thinking, provide feedback, and offer targeted practice. A key concept is cognitive fidelity—feedback that reflects how experts decompose a task and anticipate common misconceptions. The example of fraction addition illustrates how different feedback can be: a lower fidelity rubric might tell a student to find a common denominator, while a higher fidelity version uses a multi-step reasoning path (identify decision makers, compare outcomes, explain the efficient result). The section also previews an application in which knowledge engineering guides the design of AI feedback for writing in an introductory economics course, highlighting how rich rubrics and iterative testing with instructors improve system performance.

STORY 1: FEEDBACK WRITER—DESIGN, TRIALS, AND RESULTS

The Econ 101 feedback writer project is described in depth. The course includes about 360 students per semester, with multiple writing assignments that involve market failure analysis and solution proposals. A 360-student randomized trial compares an AI-supported TA workflow (the feedback writer) against a baseline TA workflow with stock rubrics and human feedback. Rubrics were iteratively refined with instructors to maximize cognitive fidelity, and feedback messages were designed as hints rather than direct answers to preserve learning opportunities. The study measures revised draft quality and post-revision learning through both straightforward rubric-based AI judgments and a post-test. Results show that AI-mediated feedback produced significantly higher quality revised drafts (effect size ~0.5, moving a student from 50th to roughly 70th percentile relative to human-only feedback). The AI judge’s accuracy against expert instructors was around 85–87% across several models (GPT-4, GPT-5, Gemini). Importantly, the study observed that the presence of AI suggestions correlates with greater revision quality among TAs, underscoring how AI can bolster expert practice. The researchers also discuss limitations: a post-test did not show a significant gain, which they attribute to dosage (students get limited opportunities to practice the skill) and the need for longer or more varied interventions. A follow-up study is outlined, exploring an AI-only first-draft feedback loop within 24 hours, plus a short TA office hour to maintain human engagement and relationships.

STORY 2: NOTE COPILOT—ALIGNING NOTE-TAKING WITH ENCODING

This section also highlights that higher engagement and richer interactions with the AI (e.g., more edits, dragging AI-generated content into the personal notes) correspond with better learning signals. The authors argue that the Note Copilot design embodies an encoding-centric approach: rather than handing learners polished notes, the system reinforces the learner’s role in transforming, structuring, and integrating the material into their own mental models. The findings advocate for a balanced triad—AI outputs support, but do not replace, the learner’s own processing. The authors discuss limitations and future directions, including exploring how encoding strategies interact with different domains, and whether adaptive variants can tailor AI support to individual note-taking styles. Together, the note-taking study reinforces the central claim: cognitively aligned interfaces, when thoughtfully integrated with knowledge engineering, can shape not just what students learn but how they learn it.

IMPLICATIONS: TWO LOOPS FOR TEACHER-AI COLLABORATION AND FUTURE PATHWAYS

Building on the two case studies, the talk articulates a broader framework for AI in education: (1) inner loop—tool support for teachers through knowledge engineering and cognitively aligned interfaces, and (2) outer loop—viewing AI as a teaching partner that can redistribute instructional resources and social capital to deepen student engagement. The inner loop emphasizes extracting expert mental models in ways that are transferable to scalable formats such as AI tutoring, automated feedback, and intelligent practice. The outer loop envisions teachers spending more time on relationship-building, mentorship, and higher-level guidance that AI cannot replace, thereby alleviating cognitive load on instructors while enhancing student motivation and alignment with real-world problems. The speaker also discusses ongoing challenges: how to efficiently extract expertise (through visualization, example selection, and simulated student solutions), how to represent knowledge, and how to test the generalizability of these methods across domains. Finally, the talk raises the AI assistance dilemma—guardrails are needed to prevent overreliance and to preserve deep cognitive engagement—alongside strategies to support self-regulation and social learning, such as fostering collaborative contexts and motivational support through human interactions. The closing message is cautiously optimistic: with disciplined knowledge engineering and thoughtfully designed interfaces, GenAI can scale personalized support and foster meaningful teacher-student relationships.

Mentioned in This Episode

●Software & Apps

●Tools

●Companies

●Studies Cited

●People Referenced

AI-assisted Feedback: Quick Reference for Instructors

Practical takeaways from this episode

Do This

Co-design rubrics with instructors to explicitly capture knowledge requirements.

Use AI to check alignment with rubrics and provide targeted feedback across criteria.

Generate hints rather than explicit answers to preserve learning opportunities.

Display AI intermediate results to allow instructors to correct AI mistakes at each step.

Avoid This

Do not reveal the exact expected answer in the feedback.

Avoid overreliance on templates; tailor feedback to individual student errors.

Avoid relying solely on AI for feedback without human interpretation or interpersonal follow-up.

Selected Quantitative Findings

Data extracted from this episode

Metric	AI Condition	Baseline Condition	Notes
Revised draft quality	AI-mediated feedback	Human-only feedback	Significant improvement; effect size ~0.5 (roughly from 50th to 70th percentile)
AI judge accuracy	AI judge ~85% accuracy (GPT-4, GPT-5, Gemini 3 Pro)	Expert instructor	AI matches expert-level judgments with ~85% accuracy
Post-test results	AI-first drafts (optional post-test)	Baseline	No significant difference; trend favoring AI not statistically significant
TA adoption of AI suggestions	Higher AI-suggest usage	Lower AI-suggest usage	Positive correlation with revision quality

Common Questions

Bloom's two sigma challenge shows that one-on-one tutoring can yield about two standard deviations in achievement over conventional classroom instruction, but delivering one-on-one tutoring to every student is impractical. This motivates AI-based personalized tutoring as a scalable alternative.

Topics

Knowledge Engineering Cognitive Tutors Cognitive Fidelity GenAI In Education AI Feedback Note Copilot Feedback Writer AI Tutoring Three AI Variants Bloom's Two Sigma Education Research K-12 Education Higher Education

Mentioned in this video

Organizations

Harvard University

Randomized trial showing a carefully designed AI tutor leads to better learning than in-class active learning in a college physics course

University of Pennsylvania

Study showing AI access without guardrails can improve performance in high school math, but it reduces learning when AI access is taken away

Stanford University

Study showing that giving human tutors real-time AI suggestions can improve students' learning in K-12 math

People

Larissa Stano

PhD student collaborator on the Feedback Writer project

Kunan Phis

Collaborator on the Note Copilot research

Nathan Yap

Collaborator on the Note Copilot research

Software & Apps

Note Copilot

Note-taking system with three variants: Automated AI, Intermediate AI, Minimal AI

Feedback Writer

System for TAs to provide feedback on students' essays, guided by rubrics and AI suggestions

AI Judge

AI-based evaluator used to assess essay quality; achieves about 85% accuracy vs. expert instructor

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free