Master thesis : Text Autocomplete System with Language Models for Amanote
Fery, Loïs
Promoteur(s) : Ittoo, Ashwin
Date de soutenance : 26-jan-2024 • URL permanente : http://hdl.handle.net/2268.2/19566
Détails
Titre : | Master thesis : Text Autocomplete System with Language Models for Amanote |
Auteur : | Fery, Loïs |
Date de soutenance : | 26-jan-2024 |
Promoteur(s) : | Ittoo, Ashwin |
Membre(s) du jury : | Louppe, Gilles
Debruyne, Christophe |
Langue : | Anglais |
Nombre de pages : | 75 |
Discipline(s) : | Ingénierie, informatique & technologie > Sciences informatiques |
Institution(s) : | Université de Liège, Liège, Belgique |
Diplôme : | Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems" |
Faculté : | Mémoires de la Faculté des Sciences appliquées |
Résumé
[en] This thesis explores the design of an autocomplete system based on language modeling that aims to be integrated into Amanote, a note-taking application for slides and syllabuses, whose primary audience is students. The system is designed to generate real-time suggestions to assist students in taking notes by reducing repetitive typing. Another aspect of the thesis is to discuss the possibility of deploying the system locally on the user's pc, eliminating the need for a server.
We discuss each stage of the system design: corpora gathering, candidate models selection, models training/fine-tuning, models evaluation, suggestions generation and deployment. We notably discuss the gathering and analysis of two datasets to train and evaluate our system: one composed of Amanote user notes and the other composed of articles from several academic disciplines. We also conduct several experiments on the candidate models to identify the most suitable ones for deployment in the application.
The results show that student notes tend to be less formal than classical texts and that a large portion of them contains many abbreviations and spelling mistakes. Moreover, the results of our experiments tend to show the effectiveness of large scale pre-training for the autocompletion task in the context of note-taking. Another noteworthy discovery is that character-level tokenization may potentially be effective for this task. Overall, we find the results promising and we are confident in the fact that our system could be useful to the users of Amanote. Moreover, our findings indicate that a local deployment of the system may be achievable, even if there are some challenges associated with it.
In essence, this thesis contributes to the advancement of autocomplete systems and to the broader goal of enhancing accessibility to neural language models by focusing on their local deployment, thereby reducing reliance on external servers.
Fichier(s)
Document(s)
Description:
Taille: 4.81 MB
Format: Adobe PDF
Annexe(s)
Description:
Taille: 1.26 MB
Format: Adobe PDF
Citer ce mémoire
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.