Abstract:To solve the problem that the correlation-based feature selection algorithm (CFS) can only recognize the linear relationship of variables for regression tasks and symmetrical uncertainty for classification tasks, a CFS feature selection algorithm based on maximum information coefficient (MIC)(named as MICCFS) is presented. It can replace the linear correlation coefficient between variables and symmetrical uncertainty in the classification task with MIC measure. The feature subset is searched by the best-first search algorithm. We conduct experiments to compare the results of MICCFS,CFS and other commonly used feature selection methods SVMRFE, Lasso, MIM, ReliefF, Chi-Square on eleven real-world datasets for regression and ten datasets for classification from UCI machine learning repository with using support vector machine (SVM), k-nearest neighbor algorithm (k-NN), naive bayes model(NB) and decision tree classifier(DT). The results show that MICCFS is superior to others.
罗幼喜,谢昆明,胡超竹,李翰芳. 基于最大信息系数的关联性特征选择算法:MICCFS[J]. 华中师范大学学报(自然科学版), 2023, 57(6): 777-785.
LUO Youxi,XIE Kunming,HU Chaozhu,LI Hanfang. MICCFS: a correlation-based feature selection algorithm based on maximum information coefficient. journal1, 2023, 57(6): 777-785.