Keyword(in traditional Chinese):

Project Introduce

Speech as Data

The original speech database is form a , provides an extensive collection of articles, reports, and editorials that reflect the socio-political landscape of China.
Our project uses those data neutrally without bias.
As a corpus, it encompasses a variety of semetic perspectives and political topics, providing a holistic view of the discourse within China.
By utilizing this database, it is potential to have a rich repository of textual data for training LLMs with a focus on China-specific knowledge.
If you are interested with our research team, please contact Dr. Hsuanlei Shao or Laboratory of Computational China Studies (LCCS)

Our Related Researches

Zhang, Chang, Dechun Zhang, and Hsuan Lei Shao. "The softening of Chinese digital propaganda: Evidence from the People’s Daily Weibo account during the pandemic." Frontiers in psychology 14 (2023): 1049671.

Shao, Hsuan-Lei, Yu-Ying Huang, and Sieh-Chuen Huang. "Prediction Model for Drunk Driving Sentencing: Applying TextCNN to Chinese Judgement Texts". Available at SSRN 4306412 (2021).

YANG, Tzu-Ying, et al. "Knowledge Prediction by Graph Embedding and Machine Learning." 人工知能学会全国大会論文集 第 36 回 (2022). 一般社団法人 人工知能学会, 2022.

Shao, Hsuan-Lei, "Machine Learning: An Application of Text Mining to Xi's Grand External Propaganda Strategy" 中國大陸研究 62.4 (2019): 133-157.

China-Studies Database and Knowledge Graph

This project has been instrumental in constructing a 'China Studies Research Paper Database' curated by Taiwanese scholars. The methodology employed in the creation of this database was meticulous and systematic. It involved the aggregation of research papers published in Taiwanese academic journals, with 'China Studies', 'Mainland Studies', and 'Chinese Communist Party' serving as the pivotal keywords for the collection process. This endeavor yielded a substantial corpus of 1,367 papers.
Subsequently, the database was structured using essential metadata fields such as 'paper titles', 'keywords', and 'abstracts'. This foundational metadata serves as the backbone of the database, facilitating the organization and retrieval of information.
The corpus of literature amassed in this database represents a significant knowledge repository for Taiwanese scholars engaged in China Studies.

Team Members

邵軒磊, Hsuan-lei SHAO

Project Director, Corresponding Author

hlshao2@gmail.com

https://orcid.org/0000-0002-7101-5272

王唯馨, Wei-hsin Wang

Project Assistant, IT Engineer

黃柏瑄, Po-Hsuan Huang

Project Assistant, IT Engineer

Team Watch Man, The Cat

Lab Member