Deep Learning-Based Intrusion Detection System: Embracing Long Short-Term Memory (LSTM) and Roughly Balanced Bagging Synergies

Authors

  • Onuorah Martins Onyekwelu University of Johannesburg, South Africa
  • Sun Yanxia University of Johannesburg, South Africa
  • Daniel Mashao University of Johannesburg, South Africa

DOI:

https://doi.org/10.4114/intartif.vol28iss76pp40-65

Keywords:

Intrusion detection, Deep learning, Class imbalance, Roughly balanced bagging, Feature selection

Abstract

This study introduces a novel approach to address class imbalance issues in network traffic datasets within a deep learning framework. We propose the implementation of roughly balanced bagging (RBB) in a long short-term memory (LSTM) architecture, using information gain (IG) to identify optimal features from an intrusion detection system (IDS) dataset exhibiting class imbalance. The approach begins with feature selection via information gain, applies RBB to create balanced subsets of the data, and then trains multiple LSTM models on these subsets to form an ensemble for improved classification of imbalanced network traffic data. Specifically, experimentation is conducted on subsets of features categorized into quartiles on the basis of their information gain, utilizing the CIC-IDS 2017 dataset. The minority class within each quartile is upsampled via the synthetic minority oversampling technique (SMOTE). Then, 10 roughly balanced bags are created from the upsampled data for classification by 10 long short-term memory (LSTM) models. This process is repeated across the first, second, and third quartiles, enabling a comprehensive analysis of feature importance and model performance across the different dataset subsets. Additionally, the dataset's 15 class labels were grouped into 7 classes on the basis of their characteristics, facilitating multiclassification tasks. Our methodology achieved an accuracy of 91.04%, precision of 91.04%, recall of 96.73%, AUC of 96.73%, and F1 score of 91.04% on binary classification using the first quartile (19) features. The performance of our methodology for multiclassification is measured by three metrics: recall, precision, and the F1 score. Class 2 has the highest recall of 98.00%, the F1 score of 92.00%, and class 3 has the highest precision of 97.00%.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Sun Yanxia, University of Johannesburg, South Africa

Department of Electrical and Electronic Engineering Science

Full Professor

Daniel Mashao, University of Johannesburg, South Africa

Faculty of Engineering and Built Environment

Full Professor

Downloads

Published

2025-06-17

How to Cite

Martins Onyekwelu, O., Yanxia, S. . ., & Mashao, D. . (2025). Deep Learning-Based Intrusion Detection System: Embracing Long Short-Term Memory (LSTM) and Roughly Balanced Bagging Synergies. Inteligencia Artificial, 28(76), 40–65. https://doi.org/10.4114/intartif.vol28iss76pp40-65