Completed

QSR Daily Outlet Revenue

Course Project · University of Sydney 2026 S1
Revenue Prediction
QSR Analytics
Regression
Elastic Net
Ridge
Lasso
KNN
Cross-Validation
Feature Engineering
Python
scikit-learn
EDA

Predictive Analytics · Regression · EDA · Feature Engineering · Model Selection

Project Overview

This project built a predictive model for daily quick-service restaurant (QSR) outlet revenue using structured temporal, location-based, promotional, and competitive features. The business goal was to support operational decisions such as staffing, inventory planning, and promotion scheduling by forecasting outlet-day revenue more accurately. I treated the task as a supervised regression problem and compared multiple models using 5-fold cross-validation with mean squared error (MSE) as the evaluation metric. The final selected model was Elastic Net, which achieved the lowest cross-validated MSE among the candidate models. The results suggested that the data structure was largely linear, while mild regularisation helped improve stability in the presence of outlet-level effects, engineered date features, and potentially correlated predictors.

What I Did

  • Built an end-to-end Python/scikit-learn pipeline covering data cleaning, missing value imputation, feature engineering, model training, validation, and final prediction export.
  • Performed EDA on revenue distribution, missingness, seasonal patterns, promotional effects, downtown location effects, rainfall, and numeric correlations.
  • Engineered calendar-based features from Date, including day, day of week, week of year, month-start/end indicators, and cyclical month encoding using sine/cosine transformations.
  • Handled realistic data issues including missing values, right-skewed revenue, outliers, and outlet-specific heterogeneity through robust preprocessing choices.
  • Compared Linear Regression, Ridge, Lasso, Elastic Net, and KNN using consistent 5-fold cross-validation and MSE-based model selection.
  • Selected Elastic Net because it combines L1 and L2 regularisation, balancing feature selection and coefficient stability for a high-dimensional encoded feature space.
  • Generated a reproducible notebook and final prediction CSV in the required assignment format, with the final cell designed to compute hidden test MSE when Test.csv is provided.
  • Critically discussed model limitations, especially the tendency of the linear model to underpredict extreme high-revenue days.

Reflection

This project strengthened my understanding of how predictive analytics connects statistical modelling with real operational decision-making. A key lesson was that model accuracy alone is not enough: the modelling process must be reproducible, well-justified, and clearly communicated. The strongest insight was that a carefully engineered linear model can perform competitively on structured business data. Although Elastic Net only slightly outperformed standard linear regression and Ridge, its regularisation made it a robust final choice given the one-hot encoded OutletID features and potential multicollinearity. The project also showed the importance of honest model interpretation. The final model fitted the central revenue range reasonably well but underpredicted unusually high-revenue days. In a real QSR setting, this limitation matters because those peak days are exactly when staffing shortages and stock-outs are most costly. If iterating further, I would test interaction effects, lagged revenue features, and alternative transformations to better capture high-demand events while keeping the model interpretable.