Feedback

HEC-Ecole de gestion de l'Université de Liège
HEC-Ecole de gestion de l'Université de Liège
MASTER THESIS
VIEW 145 | DOWNLOAD 839

Predicting review helpfulness : the case of class imbalance

Download
Delic, Leïla ULiège
Promotor(s) : Ittoo, Ashwin ULiège
Date of defense : 21-Jun-2019/25-Jun-2019 • Permalink : http://hdl.handle.net/2268.2/6399
Details
Title : Predicting review helpfulness : the case of class imbalance
Author : Delic, Leïla ULiège
Date of defense  : 21-Jun-2019/25-Jun-2019
Advisor(s) : Ittoo, Ashwin ULiège
Committee's member(s) : Heuchenne, Cédric ULiège
Hoffait, Anne-Sophie ULiège
Language : English
Number of pages : 67
Keywords : [en] Machine learning
[en] Review helpfulness
[en] Text classification
[en] Class imbalance
[en] Prediction
[en] Online customer review
Discipline(s) : Business & economic sciences > Management information systems
Institution(s) : Université de Liège, Liège, Belgique
Degree: Master en ingénieur de gestion, à finalité spécialisée en Supply Chain Management and Business Analytics
Faculty: Master thesis of the HEC-Ecole de gestion de l'Université de Liège

Abstract

[en] Online reviews are becoming increasingly abundant, which makes them sometimes overwhelming for the users. To mitigate the problem of information overload, online retailers often proceed to display them according to their helpfulness to other users. In recent years, research has been aimed at finding efficient ways to automatically predict review helpfulness. This paper offers insight on both the most appropriate algorithm for the task of predicting review helpfulness in the specific context of class imbalance and high overlap of class features, and on the pre-processing techniques which can improve classifier performance in that context. To do so, it considers three classification algorithms: random forest, multinomial naive Bayes and linear support vector machine that uses stochastic gradient descent for learning.
It shows that : (1) none of the considered algorithm exhibit satisfying performance when facing imbalanced datasets and similar class features; (2) the use of linguistic pre-processing techniques results in marginal or no improvement; (3) the use of frequency-based pre- processing yields moderate improvement; (4) re-sampling techniques are highly efficient, especially Synthetic Minority Over-sampling TEchnique (SMOTE); (5) Overall, random forest combined with SMOTE shows the best performance in terms of precision, recall and F1-score.


File(s)

Document(s)

File
Access masterThesis_LeilaDelic.pdf
Description:
Size: 742.97 kB
Format: Adobe PDF

Author

  • Delic, Leïla ULiège Université de Liège > Master ingé. gest., à fin.

Promotor(s)

Committee's member(s)

  • Heuchenne, Cédric ULiège Université de Liège - ULiège > HEC Liège : UER > Statistique appliquée à la gestion et à l'économie
    ORBi View his publications on ORBi
  • Hoffait, Anne-Sophie ULiège Université de Liège - ULiège > HEC Liège : UER > Statistique appliquée à la gestion et à l'économie
    ORBi View his publications on ORBi
  • Total number of views 145
  • Total number of downloads 839










All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.