Offline handwritten text line segmentation based on highorder correlation clustering
YIN Yalin1, LIU Aimin2, ZHOU Xiangdong3
1.Department of Digital Media Technology, Jianghan University, Wuhan 430056;2.Laboratory and Equipment Department, Central China Normal University, Wuhan 430079;3.Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714
Abstract:Text line segmentation from handwritten document images is one of important preprocessing steps in document image analysis, however, it remains a challenge because the handwritten text lines are often multiskewed, curved and overlapped. This paper proposed a novel handwritten text line segmentation method based on highorder correlation clustering. First, a hypergraph was constructed with the nodes corresponding to connected components and the edge connecting at least two connected components. Then under the learned similarity measure, the pairs of connected components were labeled as belonging or not belonging to the same text line. Finally, the connected components were merged into different text lines using unionfind algorithm. In experiments on a database with 803 unconstrained handwritten Chinese document images(HITMW), the proposed method achieved a correct rate 99.05%, and an error rate of 1.96%.