Distributed two phase intrusion detection system using machine learning techniques and underlying big data storage and processing architecture- HDFS

Authors

  • Abhijit Dnyaneshwar Jadhav Department of Computer Engineering, PCCOE&R, Ravet
  • Vidyullatha Pellakuri Associate Professor, Department of Computer Science & Engineering, Koneru Lakshamaiah Education Foundation, Guntur, A.P., India
  • Ahire Prashant G. Symbiosis Institute of Technology, Pune, India
  • Archana Chaugule PCCOE&R, Pune, India
  • Harish U. Tiwari PCCOE&R, Pune, India

DOI:

https://doi.org/10.4114/intartif.vol28iss76pp124-148

Keywords:

Accuracy, HDFS, Intrusion Detection System, Machine Learning, Two Phase Intrusion Detection

Abstract

It is crucial for organizations to secure their data in the internet era. The use of Intrusion Detection Systems (IDS) implies this security. Several researchers used various tools and methods to implement various IDS models. However, a few performance concerns that must be resolved are crucial from a security standpoint. The problems pertain to the IDS time efficiency referred as timeliness, accuracy as well as the fault tolerance. The proposed model of intrusion detection has two phases of detection. Every phase uses a different set of machine learning algorithms. Phase I employs Support Vector Machine (SVM) and k nearest neighbor (kNN), whereas Phase II uses Decision Tree and Naïve Bayes. This two phase detection takes care of reducing false positives and false negatives. To compensate the execution time of these four techniques, the big data environment—Hadoop Distributed File System (HDFS)—is utilized as the underlying storage and processing structure. With such arrangement of two phases, the model gives accuracy of 97.29% overall for known and unknown attacks. For known attacks it gives 99.49% and for unknown attacks it gives 96.28% accuracy in detecting intrusion. Also, the time efficiency is measured for training and testing of the model, for training with 10,000 records, it took 0.7 seconds which is very efficient as considered to existing systems. The detailed performance achievements are discussed in results section. Also, because of HDFS, it becomes distributed and fault tolerant intrusion detection system.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biography

Vidyullatha Pellakuri, Associate Professor, Department of Computer Science & Engineering, Koneru Lakshamaiah Education Foundation, Guntur, A.P., India

She is doctorate in CSE. Working as an Associate Professor at KLU, Guntur, A.P. India. She has 100+ Scopus/SCI publications.

Downloads

Published

2025-07-10

How to Cite

Jadhav, A. D., Pellakuri, V., Prashant G., A. ., Chaugule, A. ., & U. Tiwari, H. . (2025). Distributed two phase intrusion detection system using machine learning techniques and underlying big data storage and processing architecture- HDFS. Inteligencia Artificial, 28(76), 124–148. https://doi.org/10.4114/intartif.vol28iss76pp124-148