One-Shot Learning for Face Recognition
Brieven, Géraldine
Promotor(s) :
Louppe, Gilles
Date of defense : 26-Jun-2019/27-Jun-2019 • Permalink : http://hdl.handle.net/2268.2/6795
Details
Title : | One-Shot Learning for Face Recognition |
Author : | Brieven, Géraldine ![]() |
Date of defense : | 26-Jun-2019/27-Jun-2019 |
Advisor(s) : | Louppe, Gilles ![]() |
Committee's member(s) : | Geurts, Pierre ![]() Van Lishout, François ![]() |
Language : | English |
Number of pages : | 58 |
Keywords : | [en] deep learning [en] one-shot learning [en] face recognition |
Discipline(s) : | Engineering, computing & technology > Computer science |
Research unit : | University of Liege |
Name of the research project : | Research on Deep Learning |
Target public : | Researchers Student General public |
Institution(s) : | Université de Liège, Liège, Belgique |
Degree: | Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems" |
Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] This master thesis has been drawn up for the purpose of obtaining the Master grade in Civil Engineering in Computer Science. It relates to a research topic on deep learning focusing in particular to one-shot learning, applied to the face recognition problem. In this work, the face recognition problem is assimilated to the task consisting in identifying a person on a given picture. This classification problem fits quite well with the One-Shot learning setting since it's supported by a training set containing only few instances of each face, meaning that the model is expected to quickly integrate new people's face from few data, which represent the challenge carried by one-shot learning. In this work, 4 main phases are defined to perform the face recognition task.
First, databases containing face images have to be acquired. Here, the Labeled Faces in the Wild (LFW), the Celebrities in Frontal-Profile in the Wild (CFP), the FaceScrub cropped and part of the CASIA-WebFace are exploited, in addition to some other extra smaller databases representing common (i.e. not famous) people in order to be closer to real conditions.
From those databases, a training set is being built, based on less than 10000 face pictures, which represent a quite limited data quantity regarding the standard size of face datasets that sometimes reach hundreds of millions of face images.
Once the data has been collected, the face images are being processed, which mainly consists in detecting the face on each picture, aligning it and finally cropping the picture. Notice that the face detection relies on an external model already trained for.
Next, the focus is on how to efficiently extract features from the face patches and make the in-class (i.e. picture instances representing the same person) closer and out-class (i.e. picture instances representing different people) further in some embedding space. To do so, a Siamese Network is being trained, based on pairs of faces, to perform well on the verification task.
After a good Siamese Network has been trained, it can support the last step - the classification - consisting in assigning an identity to an input face image (probe) by leading some similarity computation between this input picture and identified face images contained in a gallery. Following that, the final predicted identity is the one the input face was the closest during the comparison process.
Besides those 4 main steps, to support the learning process based on few data, 2 extra steps are being defined. First, the Siamese network can be pretrained as encoder belonging to an autoencoder targeting to reproduce input faces. Next, a data augmentation process is defined by employing a Style GAN to derive synthetic face instances.
Finally, regarding the performance of the face recognition task, a top-10 accuracy of 84% can be obtained once a probe has been identified in front of a gallery of 200 people (each identity being represented by 8 instances). Besides this, the Siamese Network can reach a f1-score of 87% on the verification task. The autoencoder and the definition of synthetic data can improve the performance only in the case where very little data are initially available (less than 500 face pictures typically).
File(s)
Document(s)
![File](/static/img/item/file.png)
![Open access Access](/static/img/item/file/pdf.png)
Description: Report (Full Content)
Size: 17.57 MB
Format: Adobe PDF
![File](/static/img/item/file.png)
![Open access Access](/static/img/item/file/pdf.png)
Description: Summary
Size: 437.64 kB
Format: Adobe PDF
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.