Deep Learning-Based Intrusion Detection System: Embracing Long Short-Term Memory (LSTM) and Roughly Balanced Bagging Synergies
DOI:
https://doi.org/10.4114/intartif.vol28iss76pp40-65Keywords:
Intrusion detection, Deep learning, Class imbalance, Roughly balanced bagging, Feature selectionAbstract
This study introduces a novel approach to address class imbalance issues in network traffic datasets within a deep learning framework. We propose the implementation of roughly balanced bagging (RBB) in a long short-term memory (LSTM) architecture, using information gain (IG) to identify optimal features from an intrusion detection system (IDS) dataset exhibiting class imbalance. The approach begins with feature selection via information gain, applies RBB to create balanced subsets of the data, and then trains multiple LSTM models on these subsets to form an ensemble for improved classification of imbalanced network traffic data. Specifically, experimentation is conducted on subsets of features categorized into quartiles on the basis of their information gain, utilizing the CIC-IDS 2017 dataset. The minority class within each quartile is upsampled via the synthetic minority oversampling technique (SMOTE). Then, 10 roughly balanced bags are created from the upsampled data for classification by 10 long short-term memory (LSTM) models. This process is repeated across the first, second, and third quartiles, enabling a comprehensive analysis of feature importance and model performance across the different dataset subsets. Additionally, the dataset's 15 class labels were grouped into 7 classes on the basis of their characteristics, facilitating multiclassification tasks. Our methodology achieved an accuracy of 91.04%, precision of 91.04%, recall of 96.73%, AUC of 96.73%, and F1 score of 91.04% on binary classification using the first quartile (19) features. The performance of our methodology for multiclassification is measured by three metrics: recall, precision, and the F1 score. Class 2 has the highest recall of 98.00%, the F1 score of 92.00%, and class 3 has the highest precision of 97.00%.
Downloads
Metrics
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Iberamia & The Authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Open Access publishing.
Lic. under Creative Commons CC-BY-NC
Inteligencia Artificial (Ed. IBERAMIA)
ISSN: 1988-3064 (on line).
(C) IBERAMIA & The Authors