Python · JSON Processing · NLP · Text Mining · Dialogue Metrics
This project focuses on processing and analyzing human-AI role-play dialogue data generated from an internal AI training scenario. The original data was stored as nested JSON records, where each practice session contained user information, practice metadata, evaluation results, scenario details, and multiple turns of conversation logs. The goal of the project was to transform this raw and unstructured dialogue data into a clean analytical dataset that could support behavioral analysis, training evaluation, and future model or dashboard development. I designed a full data processing pipeline that converts the original JSON files into structured message-level and session-level tables. At the message level, each row represents one dialogue turn with role, content, timestamp, and sequence information. At the session level, each row represents one complete practice session with aggregated behavioral, textual, temporal, and evaluation features. Beyond basic cleaning, I built a set of interpretable metrics to describe how employees interact with the AI coach. These metrics cover practice rhythm, conversation structure, text length, question rate, lexical diversity, score progression, interaction balance, AI dominance, repetition, and novelty. The novelty measurement was designed in two layers: a lexical version based on TF-IDF or Jaccard similarity, and a semantic version based on Gemini text embeddings. This made the project not only a data cleaning task, but also an exploratory dialogue mining framework for understanding how users practice, improve, and vary their expression over time.
The most important takeaway from this project is that dialogue data cannot be analyzed directly in its raw form. A single conversation contains multiple layers of information: who participated, when the practice happened, what scenario it belonged to, how many turns occurred, how long each side spoke, whether the user asked questions, whether the AI repeated itself, and whether the employee's expression changed over time. Without a reliable processing pipeline, these signals remain hidden inside nested JSON records. This project also helped me understand that good feature engineering should connect technical processing with real training questions. For example, practice frequency and time gaps can reflect learning rhythm; score delta and moving average can reflect short-term progress; role balance and alternation rate can reflect interaction quality; lexical and semantic novelty can reflect whether employees are simply repeating old expressions or exploring new ways to communicate. Another key lesson is that surface-level text metrics are not enough. Lexical similarity can capture repeated wording, but it may miss deeper semantic repetition. That is why I added embedding-based novelty as a second layer. This made the analysis more robust because it could detect whether two sessions were similar in meaning even when the wording changed. If I continue improving this project, I would add visual dashboards for employee-level learning trajectories, cluster users by practice behavior, compare different training scenarios, and connect these behavioral features with final performance outcomes. The long-term value of this project is that it provides a reusable data foundation for AI training evaluation, coaching strategy optimization, and human-AI interaction research.