Can Large Language Models accelerate the correction of student code ?

Can Large Language Models accelerate the correction of student code ?

Coco, Andreas

Date de soutenance : 8-sep-2025/9-sep-2025 • URL permanente : `http://hdl.handle.net/2268.2/24781`

Détails

Titre :	Can Large Language Models accelerate the correction of student code ?
Auteur :	Coco, Andreas
Date de soutenance :	8-sep-2025/9-sep-2025
Promoteur(s) :	Geurts, Pierre
Membre(s) du jury :	Louppe, Gilles Donnet, Benoît Debruyne, Christophe
Langue :	Anglais
Discipline(s) :	Ingénierie, informatique & technologie > Sciences informatiques
Institution(s) :	Université de Liège, Liège, Belgique
Diplôme :	Master en science des données, à finalité spécialisée
Faculté :	Mémoires de la Faculté des Sciences appliquées

Résumé

[en] This thesis assesses to what extent large language models (LLMs) can accelerate the correction of student code in an introductory C programming course. The motivation is practical. Autograders are helpful grading tools but they miss many dimensions of code quality like clarity, efficiency and style. Hence, human review remains heavy and slow. LLMs, which can read code in context and provide natural language feedback, may fill part of this gap. The principal objective is to determine how they can be leveraged to accelerate code correction.

We conduct three sets of experiments on real coursework from the ``Additional Information Theory'' course at the University of Liège. First, we run preliminary code-generation tests to determine whether state-of-the-art LLMs can solve the course tasks. Second, we evaluate automated grading with Qwen2.5-Coder-7B on two datasets. These sets respectively consist of student submissions for a homework assignment and a project. We compare model-predicted grades and feedback to human grades. Third, we study error detection and code correction on the same homework by fine-tuning Qwen2.5-Coder-7B with LoRA using prompt-response pairs.

With respect to grading, the model's numeric predictions are not reliable. On both tasks, the mean errors often match or exceed those obtained by a constant baseline. However, when the task is reframed as a simpler classification problem where we ask the LLM whether each submission is fully correct, Qwen's performance is above chance. The best setting is the one in which we use a criteria-based prompt in French. This method consistently outperforms the baseline. Nevertheless, it remains insufficient for autonomous grading.

In error detection and correction, our initial fine-tuning with Qwen-generated data slightly improved correction rates. However, it often produced full code rewrites rather than genuine code corrections. A second fine-tuning attempt used more diverse, high-quality training data generated by OpenAI models, which encouraged targeted edits. However, this reduced correction performance on student submissions. These results indicate that improving a model's error detection and repair abilities is difficult with such limited datasets.

Overall, we find that LLMs are not powerful enough yet to replace human graders for either grading or error detection and correction. Their most promising use today is as a support tool alongside autograders and human review. Still, our findings are bounded by scope as we only used tasks in C from a specific course and minimal prompting. We recommend exploring more powerful models and considering fine-tuning on Python tasks with a larger, more comprehensive training set.

Fichier(s)

Document(s)

s2302246Coco2025.pdf
Description:
Taille: 1.75 MB
Format: Adobe PDF

Citer ce mémoire

Tous les documents disponibles sur MatheO sont protégés par le droit d'auteur et soumis aux règles habituelles de bon usage.
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.

Mémoire

Can Large Language Models accelerate the correction of student code ?

Coco, Andreas

Promoteur(s) : Geurts, Pierre

Date de soutenance : 8-sep-2025/9-sep-2025 • URL permanente : http://hdl.handle.net/2268.2/24781

Détails

Résumé

Fichier(s)

Document(s)

Auteur

Promoteur(s)

Membre(s) du jury

Citer ce mémoire

Date de soutenance : 8-sep-2025/9-sep-2025 • URL permanente : `http://hdl.handle.net/2268.2/24781`