Master's Thesis : Development of server side document processing and OCR services
Maréchal, Grégory
Promotor(s) : Leduc, Guy
Date of defense : 7-Sep-2020/9-Sep-2020 • Permalink : http://hdl.handle.net/2268.2/10882
Details
Title : | Master's Thesis : Development of server side document processing and OCR services |
Translated title : | [fr] Développement de services de traitement de documents et de services de reconnaissance optique des caractères côté serveur |
Author : | Maréchal, Grégory |
Date of defense : | 7-Sep-2020/9-Sep-2020 |
Advisor(s) : | Leduc, Guy |
Committee's member(s) : | Boigelot, Bernard
Donnet, Benoît Hannay, Sébastien |
Language : | English |
Number of pages : | 55 (65 avec annexes) |
Keywords : | [en] Android mobile [en] Spring java server [en] deep learning [en] classification [en] online training [en] image processing |
Discipline(s) : | Engineering, computing & technology > Civil engineering |
Name of the research project : | Self training classification of medical documents for a distributed mobile application |
Target public : | Professionals of domain Student |
Complementary URL : | https://www.andaman7.com/fr |
Institution(s) : | Université de Liège, Liège, Belgique |
Degree: | Master : ingénieur civil en informatique, à finalité spécialisée en "management" |
Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] Andaman7 is the name of a company and of a mobile app whose goal is to empower patients (medical term) by giving them easier access and more control on their medical data. However, the processes currently in place to import this data into the application are long and/or tedious. In this project, we will start an exploration of the possibility to use machine learning algorithms in order to automate as much as possible the process of importing data.
To do so, we will implement what will be called the dataflow, which is a complete data processing scheme, including front-end and back-end services, allowing the user to send data for automated metadata extraction, but also to review samples for which the machine learning algorithm would not be confident. This last element will allow Andaman7 to rely on online training to compensate for the lack of data.
The dataflow will then be completed with an actual machine learning algorithm which will be used to classify the sent samples. Finally, the conclusion will include a short discussion about what could be done to extract more metadata from the samples than just the class.
File(s)
Document(s)
Annexe(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.