Statistical Modeling · Model Selection · Robustness & Stability Checks
Goal: identify the key physicochemical drivers behind wine quality ratings, and build interpretable models that generalize. Dataset: • Vinho Verde wine quality dataset (red + white variants) • Outcome: sensory “quality” score • Predictors: acidity, alcohol, sulphates, residual sugar, density, etc. Approach: • Ran full diagnostic checks (linearity, normality, heteroskedasticity, leverage/outliers). • Applied transformations (e.g., log) where appropriate to stabilize variance and improve fit. • Compared model candidates via selection criteria and validation logic. • Emphasized interpretability: which variables matter most and why. Key takeaway: Quality is not driven by a single variable; the best-performing models balance interpretability with predictive stability, and show consistent importance for alcohol and a small set of chemical indicators.
This project taught me that “best model” depends on the goal: • If the goal is explanation, you want a stable set of predictors and a clean story. • If the goal is prediction, you may accept more complexity but must validate stability. I focused on bridging both: (1) rigorous diagnostics + transformations, (2) selection with sanity checks, (3) conclusions that remain consistent across reasonable modeling choices. If iterating further, I’d: • add cross-validated performance comparisons, • explore nonlinearities (splines/interactions), • compare red vs white with a unified model including type interactions, • build a small interactive report (filters + coefficient explorer) for better storytelling.