Develop a predictor for detecting sentiments toward China in online texts
Background
Understanding foreign public sentiment toward China is strategically important. Social media short texts offer real-time insights but pose challenges due to brevity, colloquial language, and cultural nuance. This study leverages large-scale pre-trained language models to predict and model overseas sentiments toward China, providing a robust tool for public opinion analysis and policy evaluation.
Tasks
This study comprises two classification tasks:
- Topic Classification of online texts from Platform X
- Sentiment Classification toward China in online texts from Platform X
Considering the inherent characteristics of the data and the complexity of the tasks, we have designed distinct technical implementation approaches for each task.
Methods
Topic Classification Task
Step 1: Collect online text data.
Step 2: Extract features from the collected texts and construct the text dataset.
Step 3: Using the inherent topic labels of the data, train and fine-tune different pre-trained BERT models.
Step 4: Conduct experiments on the validation set using Accuracy to compare models and select the best-performing one.
Step 5: Evaluate the selected model comprehensively on the test set using Accuracy, Precision, Recall, and F1-score.
Sentiment Prediction Task Toward China
The sentiment prediction task involves processes such as text encoding, LLM-based data augmentation, and fine-tuning of different pre-trained BERT models, making it relatively complex. A simplified diagram is shown below:
Results and Future Directions
After tuning, in the topic classification task, the optimal model was BERTweet-base with a learning rate of 5e-5, achieving a test set accuracy of 0.936.
For the China-sentiment prediction task, the best model was BERTweet-base with a learning rate of 2e-5, reaching a test set accuracy of 0.786.
Future directions: For the online text China-sentiment classification task, improving the consistency of human annotations and enhancing the model’s performance in fine-grained category discrimination will enable more accurate and stable automated sentiment detection.