Generating in-context, personalized feedback for intelligent tutors with large language models
International Journal of Artificial Intelligence in Education
2025
Abstract
This study explores how large language models (LLMs), specifically GPT-4, could be used to generate personalized feedback within an Intelligent Tutoring System (ITS). The research focuses on evaluating the model’s ability to (1) diagnose student errors, (2) generate personalized corrective feedback, and (3) assess the accuracy of diagnoses and helpfulness of the feedback. We analyze student errors from the Apprentice Tutor College Algebra ITS and prompt GPT-4 to give targeted feedback on those errors. The findings suggest that while this model can effectively diagnose a range of student errors, its feedback varies in effectiveness based on the complexity of the problem and the type of error. While GPT-4 generates relevant, specific feedback a majority of the time, 35% of the hints were too general, incorrect, or give away the correct answer. The study also explores methods for using an LLM to automatically evaluate the validity of generated feedback, and finds that only 35% of feedback passes automated helpfulness evaluations.
BibTeX
@article{reddig-ijaied-2025,
title = {Generating in-context, personalized feedback for intelligent tutors with large language models},
author = {Reddig, Jennifer and Arora, Arav and MacLellan, Christopher J.},
journal = {International Journal of Artificial Intelligence in Education},
volume = {35},
pages = {3459-3500},
year = {2025},
doi = {10.1007/s40593-025-00505-6},
}
