Drone control through a vocal interface
Bolland, Julien
Promoteur(s) : Redouté, Jean-Michel
Date de soutenance : 24-jui-2021/25-jui-2021 • URL permanente : http://hdl.handle.net/2268.2/11563
Détails
Titre : | Drone control through a vocal interface |
Auteur : | Bolland, Julien |
Date de soutenance : | 24-jui-2021/25-jui-2021 |
Promoteur(s) : | Redouté, Jean-Michel |
Membre(s) du jury : | Louppe, Gilles
Embrechts, Jean-Jacques Greffe, Christophe |
Langue : | Anglais |
Nombre de pages : | 71 |
Mots-clés : | [en] keywords spotting [en] drone [en] deep learning [en] voice control [en] cnn [en] resnet [en] attention [en] rnn [en] drone control |
Discipline(s) : | Ingénierie, informatique & technologie > Sciences informatiques |
Organisme(s) subsidiant(s) : | GeneriX |
Public cible : | Chercheurs Professionnels du domaine Etudiants |
Institution(s) : | Université de Liège, Liège, Belgique |
Diplôme : | Master : ingénieur civil en informatique, à finalité spécialisée en "management" |
Faculté : | Mémoires de la Faculté des Sciences appliquées |
Résumé
[en] This work is an implementation of a voice interface used to control a DJI Tello drone. A signal processing part is used for the analysis of voice and a deep learning part allows the interface to "understand" the commands sent to the drone, through keyword spotting techniques. In order to extract information of the pilot's voice, spectrogram and MFCC are used as features. Deep learning models (convolutional neural networks (CNN), attention-based recurrent neural network (Att-RNN) and residual network (ResNet)) are trained over these features to classify a dataset of words. A regular language is also created to allow a codified communication between the pilot and the drone.
The dataset used in this work is the concatenation of an open-source one and a self-made one, where data has been gathered through volunteers on the web. In terms of prediction accuracy, ResNet and Att-RNN give the best results, respectively 95 % and 97 % in a non-noisy environment, and some tools are given to understand why a model predicts a particular command instead of another for the spectrogram feature.
In real life application, experiments have shown that a ResNet model trained on a quiet dataset and used with the MFCC feature gives the best results in quiet and noisy environments. Furthermore, the drone reacts almost immediately after the pilot has given the entire command, as a delay of less than 0.3 seconds is to be expected. The final interface is a graphical user interface working on a web browser.
Fichier(s)
Document(s)
Description: -
Taille: 3.13 MB
Format: Adobe PDF
Citer ce mémoire
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.