Abstract:In order to improve the performance of the classifier in the case of a small number of samples, a semi-supervised sample selection method based on the collaboration of multiple classifiers is proposed, which uses unlabeled samples to achieve sample enhancement and improve the generalization ability of the classifier. Relying on the mutual supervision of multiple classifiers and the principle of consistent labeling of multiple classifiers, the labeled samples are used as the training set, and the two classifiers SVM and RF are used for co-training. The category labels and certainty values of the multi-classifiers are used as constraints. The most representative samples are selected from the unlabeled sample set to form the enhanced sample set, and the accuracy is used as the evaluation standard to verify the influence of the algorithm on the generalization performance of the classifier. This algorithm is tested on the handwritten digit dataset (Mnist character library) and the Landsat soil dataset. The experimental results show that compared to the classifier constructed by a small number of original training samples, the accuracy of all categories predicted by the enhanced sample classifier is improved. The overall accuracy of the two data sets has increased by 5.97% and 7.02%, respectively. The accuracy of number 5 in the Mnist data set has the highest increase (an increase of 11.9%, from 79.3% to 91.2%), and the soil 3 in the Landsat soil data set is accurate. The rate increase is the most obvious (15.8% increase, from 73.5% to 89.3%), and the results prove that the algorithm has a certain degree of robustness. At the same time, compared with the classic KNN, Co-training and Co-forest algorithms, the proposed algorithm can maximize the use of unlabeled sample information and has the best accuracy performance, which proves the advantages of the proposed algorithm in this research.