The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

Mashaan A.  Alshammari; Mohammad Alshayeb

doi:10.4114/intartif.vol24iss68pp72-88

The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

Authors

Mashaan A. Alshammari University of Ha'il, Ha'il, Saudi Arabia
Mohammad Alshayeb King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia

DOI:

https://doi.org/10.4114/intartif.vol24iss68pp72-88

Keywords:

Software Defect Prediction, Support Vector Machine, Feature Selection

Abstract

The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2021-10-26

How to Cite

Alshammari, M. A. ., & Alshayeb, M. (2021). The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study. Inteligencia Artificial, 24(68), 72–88. https://doi.org/10.4114/intartif.vol24iss68pp72-88

Download Citation

Issue

Vol. 24 No. 68 (2021): Inteligencia Artificial (December 2021)

Section

Regular Papers

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Open Access publishing.
Lic. under Creative Commons CC-BY-NC
Inteligencia Artificial (Ed. IBERAMIA)
ISSN: 1988-3064 (on line).
(C) IBERAMIA & The Authors

The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

Authors

DOI:

Keywords:

Abstract

Downloads

Metrics

Downloads

Published

How to Cite

Issue

Section

License

open

Inteligencia Artificial
_{An international open access journal.
Edited by Iberamia. e-ISSN: 1988-3064}

Make a Donation

J. Impact Factor 2024: 3.7 (Q2)

ONGOING ISSUE

ALL ISSUES

Information

Current Issue

The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

Authors

DOI:

Keywords:

Abstract

Downloads

Metrics

Downloads

Published

How to Cite

Issue

Section

License

open

Inteligencia ArtificialAn international open access journal.Edited by Iberamia. e-ISSN: 1988-3064

Make a Donation

J. Impact Factor 2024: 3.7 (Q2)

ONGOING ISSUE

ALL ISSUES

Information

Current Issue

Inteligencia Artificial
_{An international open access journal.
Edited by Iberamia. e-ISSN: 1988-3064}