Completed

Taylor Swift Engagement Analysis

Independent Research• 2025.08
YouTube API
Gemini LLM
Sentiment Analysis
ELM
Bayesian Updating
Python

YouTube API · Gemini API· Sentiment Analysis · ELM Theory · Bayesian Updating

Project Overview

Analyzed 5,700+ YouTube comments on Taylor Swift’s engagement news to explore how public sentiment evolves online. This project combined machine learning with persuasion theory to reveal dynamics of digital discourse.

What I Did

  • Collected 5,700+ comments using the YouTube Data API, applied deduplication, regex cleaning, and semantic normalization to construct a research dataset.
  • Applied Elaboration Likelihood Model (ELM): classified comments into central (rational) vs peripheral (emotional) routes using Gemini LLM.
  • Conducted sentiment analysis (positive / neutral / negative) with Gemini API, validated against 100 manually labeled samples.
  • Tracked sentiment evolution using Bayesian updating; applied rolling variance for polarization and herding tests for collective dynamics.
  • Found negative voices increasingly dominant; peripheral (emotional) routes far exceeded central (rational) ones, resembling financial markets’ 'overreaction–correction' cycle.

Reflection

Through this project, I realized that working with social media data requires both technical rigor and theoretical grounding. While the YouTube API provided large-scale comments efficiently, significant effort was needed for cleaning and normalization (duplicates, slang, emojis). This reinforced the importance of data preprocessing as decisive stage that shapes model reliability. On the modeling side, the Elaboration Likelihood Model (ELM) proved valuable instructuring the classification between central and peripheral persuasion pathways. Yet, the relatively lower accuracy on pathway labeling (88%) compared to sentiment classification (99%) highlighted that theoretical constructs are often harder to operationalize in real-world noisy data. It reminded me that applying social science theories to digital discourse is not only a technical task but also requires careful mapping between constructs and observable signals. Finally, the temporal analysis using Bayesian updating and herding tests showed how public sentiment can resemble financial market dynamics—initial overreaction, herd formation, and eventual correction. This analogy broadened my perspective: socialdata analysis is not limited to describing trends, but can also generate insightsfor policy, media strategy, and platform governance. In the future, I would extend the dataset to cross-platform comparisons (e.g., TikTok, Twitter) and refine the labeling scheme with multi-annotator validation to strengthen the robustness of the conclusions.