Term extraction from domain specific texts
Poumay, Judicaël
Promotor(s) : Ittoo, Ashwin
Date of defense : 9-Sep-2019/10-Sep-2019 • Permalink : http://hdl.handle.net/2268.2/7487
Details
Title : | Term extraction from domain specific texts |
Author : | Poumay, Judicaël |
Date of defense : | 9-Sep-2019/10-Sep-2019 |
Advisor(s) : | Ittoo, Ashwin |
Committee's member(s) : | Jamar, Julie
Gribomont, Pascal |
Language : | English |
Number of pages : | 36 |
Keywords : | [en] term extraction [en] terminology extraction [en] financial text [en] information extraction [en] abbreviation extraction [en] long term [en] complex terminology [en] multi word term [en] termhood [en] unithood [en] unsupervised |
Discipline(s) : | Engineering, computing & technology > Computer science |
Target public : | Researchers Professionals of domain Student |
Institution(s) : | Université de Liège, Liège, Belgique |
Degree: | Master en science des données, à finalité spécialisée |
Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] In the thesis, we developed a novel unsupervised algorithm for terminology extraction (TE).
TE consists in detecting and ranking possible terms from a given document. While a term is a sequence of words that refers to a particular concept in a given domain.
This thesis also brings with it two other ancillary contributions. A new relevancy measure for term ranking; which uses a mix of a termhood, a unithood, and a noise measure to provide a reliable score. And an abbreviation extractor which discovers and extracts the extended form of abbreviated terms using a simple heuristic.
Many algorithms already exist for extracting terms but they have limitations. Primarily, we found that no current method was capable of reliably extracting long and complex terminology. Therefore, the algorithm we proposed was designed to handle such task.
File(s)
Document(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.