Abstract:Human beings have continuously created various language technologies to assist the application of language and improve language life, ranging from rope writing and ideograms to the creation of writing symbols, the application of printing, and the popularization of broadcasting and filming, and now we have entered the stage of modern language technology represented by the Internet and linguistic intelligence. The direct “human-human” communication method is gradually decreasing, while the indirect “human-machine-human” communication method is becoming the norm, and the future is entering the era of “human-machine symbiosis” where humans are equipped with AI assistants. The language model represented by ChatGPT is the peak of the development of human language technology today, which shows the powerful function of big data, especially that of the language data. However, the knowledge deficiencies shown in the language expression of the language model are caused by the lack of “special domain data” on the network in specialized fields, special populations, special scenarios, and non-common languages. Data, including language data, has become a key element in the development of new science and technology and a production factor of modern economy. It is necessary to manage data with laws, regulations, norms and standards, promote the production, circulation and utilization of data through the data market, and effectively make up for the lack of data on the Internet by gathering “special domain data” through the data companies in a planned way. It is also necessary to promote citizens' ability to adapt to AI assistants through language intelligence education, and shift the labor force to new positions generated by new technologies in a timely manner through the job market forecasting mechanism. Data management should be appropriately lenient and stringent, so as to promote the development of linguistic intelligence as much as possible, but also to ensure that the technology is good and moves forward on an ethical track.
李宇明. 人机共生时代的语言数据问题[J]. 华中师范大学学报(人文社会科学版), 2023, 62(5): 135-143.
Li Yuming. On the Issues of Language Data inthe Era of Human-Machine Symbiosis. journal1, 2023, 62(5): 135-143.