一种基于协同训练半监督的分类算法

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (1317 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要为提高少量样本情况下分类器的性能，提出一种基于多分类器协同的半监督样本选择方法，利用未标注样本实现样本增强，提高分类器泛化能力.依靠多分类器的互相监督和多分类器标签一致的原理，将已标记样本作为训练集，利用SVM和RF两个分类器协同训练，多分类器的类别标签和确定度值作为约束条件，从未标记样本集中筛选出最有代表性的样本构成增强样本集，以准确率为评价标准，验证本算法对分类器泛化性能的影响.本算法在手写数字数据集(Mnist字符库)和Landsat土壤数据集上测试，实验结果表明相比少量原始训练样本构建的分类器，增强样本构建分类器预测的全部类别准确率都得到提升.两个数据集的总体准确率分别提升5.97%和7.02%，Mnist数据集中数字5这类准确率提升最高(提升11.9%，从79.3%到91.2%)，Landsat土壤数据集中土壤3这一类准确率提升最明显(提升15.8%，从73.5%到89.3%)，结果证明了该算法显著提高了分类器的泛化性能.同时与经典的KNN、Co-training和Co-forest算法对比，所提出的算法能够最大限度地利用未标记样本信息，具有最好的精度表现，证明了该研究提出算法的优越性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王　宇
	李延晖

关键词 ：半监督学习, 协同训练, 支持向量机, 随机森林, 样本增强

Abstract：In order to improve the performance of the classifier in the case of a small number of samples， a semi-supervised sample selection method based on the collaboration of multiple classifiers is proposed， which uses unlabeled samples to achieve sample enhancement and improve the generalization ability of the classifier. Relying on the mutual supervision of multiple classifiers and the principle of consistent labeling of multiple classifiers， the labeled samples are used as the training set， and the two classifiers SVM and RF are used for co-training. The category labels and certainty values of the multi-classifiers are used as constraints. The most representative samples are selected from the unlabeled sample set to form the enhanced sample set， and the accuracy is used as the evaluation standard to verify the influence of the algorithm on the generalization performance of the classifier. This algorithm is tested on the handwritten digit dataset (Mnist character library) and the Landsat soil dataset. The experimental results show that compared to the classifier constructed by a small number of original training samples， the accuracy of all categories predicted by the enhanced sample classifier is improved. The overall accuracy of the two data sets has increased by 5.97% and 7.02%， respectively. The accuracy of number 5 in the Mnist data set has the highest increase (an increase of 11.9%， from 79.3% to 91.2%)， and the soil 3 in the Landsat soil data set is accurate. The rate increase is the most obvious (15.8% increase， from 73.5% to 89.3%)， and the results prove that the algorithm has a certain degree of robustness. At the same time， compared with the classic KNN， Co-training and Co-forest algorithms， the proposed algorithm can maximize the use of unlabeled sample information and has the best accuracy performance， which proves the advantages of the proposed algorithm in this research.

Key words： semi-supervised classification collaborative training SVM RF image classification

收稿日期: 2021-12-15

引用本文:

王　宇,李延晖. 一种基于协同训练半监督的分类算法[J]. 华中师范大学学报(自然科学版), 2021, 55(6): 1020-1029.
WANG Yu,LI Yanhui. A semi-supervised image classification algorithm based on collaborative training. journal1, 2021, 55(6): 1020-1029.

链接本文:

http://journal.ccnu.edu.cn/zk//CN/ 或 http://journal.ccnu.edu.cn/zk//CN/Y2021/V55/I6/1020