Completed

Short Drama Text Analysis: Human-AI Dialogue Behavior Mining

Independent Data Project 2025.09
Beauty
Consumer Services
AI Training
NLP
Text Mining
Dialogue Analysis
Feature Engineering

Python · JSON Processing · NLP · Text Mining · Dialogue Metrics

Project Overview

Processed internal human–AI role-play dialogues (JSON → CSV) and built conversation-level metrics to quantify practice rhythm, engagement, diversity/novelty, and session variation.

What I Did

  • Parsed nested JSON dialogues into structured tabular datasets (conversation/message level).
  • Engineered metrics: turns, duration, interaction density, lexical diversity, repetition and novelty proxies.
  • Summarized behavioral patterns across sessions and exported clean datasets for downstream modeling.

Methodology

  • Ingestion + cleaning: role separation, timestamp normalization, text normalization, invalid record removal.
  • Conversation reconstruction: session grouping and sequencing; message-level to conversation-level aggregation.
  • Feature engineering: interpretable indicators for engagement, rhythm, and content diversity.

Reflection

Key takeaway: conversational data is unstructured by nature, so a reliable processing pipeline is the real foundation for any dialogue analytics or evaluation.