Research on feature selection techniques in software defect prediction
Zijie Zhou
Northeastern University
DOI: https://doi.org/10.59429/esta.v12i1.9658
Keywords: Software defect prediction; Feature selection; Information gain; And chi-square test
Abstract
This paper aims to study the application of feature selection technology in software defect prediction, explore the influence of different feature selection methods on the performance of prediction model, and then improve the accuracy and efficiency of software defect prediction. Three common feature selection methods were used: information gain, chi-square test and recursive feature elimination (RFE), and experimental validation was conducted using support Vector Machine (SVM), decision tree, random forest and logistic regression. The experimental results show that the combination of random forest with recursive feature elimination (RFE) method shows optimal performance on multiple indicators, with accuracy of 0.89, precision of 0.84, recall of 0.91, F1 value of 0.87 and AUC value of 0.93, which are better than the other combinations. The junction feature selection can significantly improve the accuracy and efficiency of the model in the software defect prediction. Through the appropriate feature selection methods, it can effectively reduce the redundant features, reduce the model complexity, and improve the prediction accuracy.
References
[1] Zhang Jian, Jiang Hong. Research on unbalanced data classification algorithm in software defect prediction [J]. Information Technology, 2024, (12): 149-158 + 166.
[2] Wang Yue, Li Yong, Zhang Wenjing. Active learning method for software defect prediction [J]. Modern Electronic Technology, 2024,47 (20): 101-108.
[3] Tang Yu, Dai Qi, Yang Zhiwei, et al. Research on software defect prediction algorithm based on optimized random forest [J]. Computer Engineering and Science, 2023,45 (05): 830-839.
[4] Li Huilai, Yang Bin, Yu Xiuli, et al. Comparison of the interpretability of the software defect prediction model [J]. Computer Science, 2023,50 (05): 21-30.
[5] Cheng Hui. Research on software defect prediction techniques based on feature selection and ensemble learning [D]. Hangzhou Dianzi University, 2017.