Can Large Language Models accelerate the correction of student code ?

Can Large Language Models accelerate the correction of student code ?

Coco, Andreas

Date of defense : 8-Sep-2025/9-Sep-2025 • Permalink : `http://hdl.handle.net/2268.2/24781`

Details

Title :	Can Large Language Models accelerate the correction of student code ?
Author :	Coco, Andreas
Date of defense :	8-Sep-2025/9-Sep-2025
Advisor(s) :	Geurts, Pierre
Committee's member(s) :	Louppe, Gilles Donnet, Benoît Debruyne, Christophe
Language :	English
Discipline(s) :	Engineering, computing & technology > Computer science
Institution(s) :	Université de Liège, Liège, Belgique
Degree:	Master en science des données, à finalité spécialisée
Faculty:	Master thesis of the Faculté des Sciences appliquées

Abstract

[en] This thesis assesses to what extent large language models (LLMs) can accelerate the correction of student code in an introductory C programming course. The motivation is practical. Autograders are helpful grading tools but they miss many dimensions of code quality like clarity, efficiency and style. Hence, human review remains heavy and slow. LLMs, which can read code in context and provide natural language feedback, may fill part of this gap. The principal objective is to determine how they can be leveraged to accelerate code correction.

We conduct three sets of experiments on real coursework from the ``Additional Information Theory'' course at the University of Liège. First, we run preliminary code-generation tests to determine whether state-of-the-art LLMs can solve the course tasks. Second, we evaluate automated grading with Qwen2.5-Coder-7B on two datasets. These sets respectively consist of student submissions for a homework assignment and a project. We compare model-predicted grades and feedback to human grades. Third, we study error detection and code correction on the same homework by fine-tuning Qwen2.5-Coder-7B with LoRA using prompt-response pairs.

With respect to grading, the model's numeric predictions are not reliable. On both tasks, the mean errors often match or exceed those obtained by a constant baseline. However, when the task is reframed as a simpler classification problem where we ask the LLM whether each submission is fully correct, Qwen's performance is above chance. The best setting is the one in which we use a criteria-based prompt in French. This method consistently outperforms the baseline. Nevertheless, it remains insufficient for autonomous grading.

In error detection and correction, our initial fine-tuning with Qwen-generated data slightly improved correction rates. However, it often produced full code rewrites rather than genuine code corrections. A second fine-tuning attempt used more diverse, high-quality training data generated by OpenAI models, which encouraged targeted edits. However, this reduced correction performance on student submissions. These results indicate that improving a model's error detection and repair abilities is difficult with such limited datasets.

Overall, we find that LLMs are not powerful enough yet to replace human graders for either grading or error detection and correction. Their most promising use today is as a support tool alongside autograders and human review. Still, our findings are bounded by scope as we only used tasks in C from a specific course and minimal prompting. We recommend exploring more powerful models and considering fine-tuning on Python tasks with a larger, more comprehensive training set.

File(s)

Document(s)

s2302246Coco2025.pdf
Description:
Size: 1.75 MB
Format: Adobe PDF

Cite this master thesis

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

MASTER THESIS

Can Large Language Models accelerate the correction of student code ?

Coco, Andreas

Promotor(s) : Geurts, Pierre

Date of defense : 8-Sep-2025/9-Sep-2025 • Permalink : http://hdl.handle.net/2268.2/24781

Details

Abstract

File(s)

Document(s)

Author

Promotor(s)

Committee's member(s)

Cite this master thesis

Date of defense : 8-Sep-2025/9-Sep-2025 • Permalink : `http://hdl.handle.net/2268.2/24781`