Speech as Data
The original speech database is form a , provides an extensive collection of articles, reports, and editorials that reflect the socio-political landscape of China.
Our project uses those data neutrally without bias.
As a corpus, it encompasses a variety of semetic perspectives and political topics, providing a holistic view of the discourse within China.
By utilizing this database, it is potential to have a rich repository of textual data for training LLMs with a focus on China-specific knowledge.
If you are interested with our research team, please contact Dr. Hsuanlei Shao or Laboratory of Computational China Studies (LCCS)
Our Related Researches
Zhang, Chang, Dechun Zhang, and Hsuan Lei Shao. "The softening of Chinese digital propaganda: Evidence from the People’s Daily Weibo account during the pandemic." Frontiers in psychology 14 (2023): 1049671.
Shao, Hsuan-Lei, Yu-Ying Huang, and Sieh-Chuen Huang. "Prediction Model for Drunk Driving Sentencing: Applying TextCNN to Chinese Judgement Texts". Available at SSRN 4306412 (2021).
YANG, Tzu-Ying, et al. "Knowledge Prediction by Graph Embedding and Machine Learning." 人工知能学会全国大会論文集 第 36 回 (2022). 一般社団法人 人工知能学会, 2022.
Shao, Hsuan-Lei, "Machine Learning: An Application of Text Mining to Xi's Grand External Propaganda Strategy" 中國大陸研究 62.4 (2019): 133-157.
China-Studies Database and Knowledge Graph
This project has been instrumental in constructing a 'China Studies Research Paper Database' curated by Taiwanese scholars. The methodology employed in the creation of this database was meticulous and systematic. It involved the aggregation of research papers published in Taiwanese academic journals, with 'China Studies', 'Mainland Studies', and 'Chinese Communist Party' serving as the pivotal keywords for the collection process.
This endeavor yielded a substantial corpus of 1,367 papers.
Subsequently, the database was structured using essential metadata fields such as 'paper titles', 'keywords', and 'abstracts'. This foundational metadata serves as the backbone of the database, facilitating the organization and retrieval of information.
The corpus of literature amassed in this database represents a significant knowledge repository for Taiwanese scholars engaged in China Studies.