Predicting review helpfulness : the case of class imbalance

Predicting review helpfulness : the case of class imbalance

Delic, Leïla

Date of defense : 21-Jun-2019/25-Jun-2019 • Permalink : `http://hdl.handle.net/2268.2/6399`

Details

Title :	Predicting review helpfulness : the case of class imbalance
Author :	Delic, Leïla
Date of defense :	21-Jun-2019/25-Jun-2019
Advisor(s) :	Ittoo, Ashwin
Committee's member(s) :	Heuchenne, Cédric Hoffait, Anne-Sophie
Language :	English
Number of pages :	67
Keywords :	[en] Machine learning [en] Review helpfulness [en] Text classification [en] Class imbalance [en] Prediction [en] Online customer review
Discipline(s) :	Business & economic sciences > Management information systems
Institution(s) :	Université de Liège, Liège, Belgique
Degree:	Master en ingénieur de gestion, à finalité spécialisée en Supply Chain Management and Business Analytics
Faculty:	Master thesis of the HEC-Ecole de gestion de l'Université de Liège

Abstract

[en] Online reviews are becoming increasingly abundant, which makes them sometimes overwhelming for the users. To mitigate the problem of information overload, online retailers often proceed to display them according to their helpfulness to other users. In recent years, research has been aimed at finding efficient ways to automatically predict review helpfulness. This paper offers insight on both the most appropriate algorithm for the task of predicting review helpfulness in the specific context of class imbalance and high overlap of class features, and on the pre-processing techniques which can improve classifier performance in that context. To do so, it considers three classification algorithms: random forest, multinomial naive Bayes and linear support vector machine that uses stochastic gradient descent for learning.
It shows that : (1) none of the considered algorithm exhibit satisfying performance when facing imbalanced datasets and similar class features; (2) the use of linguistic pre-processing techniques results in marginal or no improvement; (3) the use of frequency-based pre- processing yields moderate improvement; (4) re-sampling techniques are highly efficient, especially Synthetic Minority Over-sampling TEchnique (SMOTE); (5) Overall, random forest combined with SMOTE shows the best performance in terms of precision, recall and F1-score.

File(s)

Document(s)

masterThesis_LeilaDelic.pdf
Description:
Size: 742.97 kB
Format: Adobe PDF

Cite this master thesis

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

MASTER THESIS

Predicting review helpfulness : the case of class imbalance

Delic, Leïla

Promotor(s) : Ittoo, Ashwin

Date of defense : 21-Jun-2019/25-Jun-2019 • Permalink : http://hdl.handle.net/2268.2/6399

Details

Abstract

File(s)

Document(s)

Author

Promotor(s)

Committee's member(s)

Cite this master thesis

Date of defense : 21-Jun-2019/25-Jun-2019 • Permalink : `http://hdl.handle.net/2268.2/6399`