Using Deep Learning for Malware generation to bypass Malware detection mechanisms

Using Deep Learning for Malware generation to bypass Malware detection mechanisms

Schleich, Corinne

Date of defense : 6-Sep-2021/7-Sep-2021 • Permalink : `http://hdl.handle.net/2268.2/13112`

Details

Title :	Using Deep Learning for Malware generation to bypass Malware detection mechanisms
Author :	Schleich, Corinne
Date of defense :	6-Sep-2021/7-Sep-2021
Advisor(s) :	Louppe, Gilles Donnet, Benoît
Committee's member(s) :	Mathy, Laurent Boigelot, Bernard
Language :	English
Discipline(s) :	Engineering, computing & technology > Computer science
Commentary :	Codes can be found at : https://gitlab.com/CorinneSch/using-deep-learning-for-malware-generation-to-bypass-malware-detection-mechanisms
Institution(s) :	Université de Liège, Liège, Belgique
Degree:	Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems"
Faculty:	Master thesis of the Faculté des Sciences appliquées

Abstract

[en] Machine learning (ML) has become increasingly accessible in recent years. Documentation about ML techniques and their implementation is widespread over the internet. As a result, they are increasingly used for malicious intent. The creation of malware variants that will not be detected by security measures is fastidious and difficult. The idea of this thesis is therefore to use deep learning techniques in order to automatically generate malware variants that can bypass classical detection techniques such as signature checking. The deep learning models used are encoder-decoder models that proved themselves as state-of-the art models for the task of text generation. Two main types of models have been tested in the scope of this thesis: variational auto-encoders (VAEs) and sequence-to-sequence models with attention. For each, different architectures have been implemented and tested. Convolutional and Recurrent Neural Network (RNN) VAEs are tested with different parameters and on different datasets. Different sequence-to-sequence architectures with a Bahdanau attention mechanism are also tested. The models are trained using a benign dataset since the goal is to add modifications to an initial code, it appears at this stage inappropriate to apply this specifically on malware files. This has the advantage to construct the dataset more easily since finding benign C files is a simpler task than finding enough malware files. A training dataset is also created, it is made up of variants of C code lines. Using modified lines instead of fully modified codes has the advantage that the produced database is composed of smaller sequences and should for this reason ease the learning of the deep learning models. To this end, an initial file is chosen and parsed into lines of code. A genetic algorithm is then used on each line to add random modifications to the initial line in order to create variants of this line. Those variants are then collected to form the dataset. Genetic algorithms (GAs) have already been used in other works and have shown good results in generating code variants. Hence, GAs were chosen to generate the variant datasets. Tests have been performed with the outputs of the GA. Files are reconstructed using the modified lines and tested in order to verify that the modifications added to the initial lines are effective also at the binary level of the file. To this end, the reconstructed files are compiled and the output of the compiler is used to measure the similarity between the initial code and the reconstructed code. The goal is to have a low similarity rate so that the different codes are as different as possible from their initial code. This thesis is a first step in the area of malware variant code generation using deep learning methods. The tested deep learning models seem to have a great difficulty in learning to generate such variant sequences and therefore do not lead to revolutionary predictions. However, the results allow to draw important conclusions to direct future research in the field.

File(s)

Document(s)

Master_thesis.pdf
Description:
Size: 5.68 MB
Format: Adobe PDF

Ask a request copy

Summary.pdf
Description:
Size: 94.63 kB
Format: Adobe PDF

Ask a request copy

Changelog.pdf
Description:
Size: 105.64 kB
Format: Adobe PDF

Ask a request copy

Cite this master thesis

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

MASTER THESIS

Using Deep Learning for Malware generation to bypass Malware detection mechanisms

Schleich, Corinne

Promotor(s) : Louppe, Gilles ; Donnet, Benoît

Date of defense : 6-Sep-2021/7-Sep-2021 • Permalink : http://hdl.handle.net/2268.2/13112

Details

Abstract

File(s)

Document(s)

Author

Promotor(s)

Committee's member(s)

Cite this master thesis

Date of defense : 6-Sep-2021/7-Sep-2021 • Permalink : `http://hdl.handle.net/2268.2/13112`