Completed

Greater Sydney SA2 Resource Scoring

Group Project · University of Sydney 2025 S1
PostgreSQL
PostGIS
pandas/geopandas
ABS / GTFS / NSW POI API
Z-score
Sigmoid
Rank-based
Lasso
OLS
Choropleth

Spatial Analytics · PostgreSQL/PostGIS · Composite Scoring

Project Overview

We evaluate how “well-resourced” each SA2 in selected SA4 zones of Greater Sydney is by building a spatial database and a composite scoring system. We integrated six datasets (ABS SA2/Population/Income, Retail Business counts, Transport for NSW GTFS stops, NSW DoE school catchments, NSW POI API), standardized geometries to GDA2020 (EPSG:7844), and performed all joins in PostGIS. Each indicator was normalized (z-scores) and aggregated; the sum was passed through a sigmoid to obtain a final score in [0,1]. Key findings: • Sydney – Inner West: consistently high scores across SA2s (dense infrastructure & transport). • Sydney – Blacktown: largest internal disparity (south high; north low), indicating spatial inequality. • Sydney – Eastern Suburbs: mixed performance. • Pearson correlation between score and median income is weak & slightly negative (≈ −0.08), suggesting resource access is not simply a function of income in this subset. We also add robustness checks via rank-based scoring and validate predictors with Lasso + OLS.

What I Did

  • Built a reproducible spatial database (PostgreSQL + PostGIS), unified all layers in GDA2020 (EPSG:7844), and created SA4 filters/views with GiST indexes.
  • ETL with pandas/geopandas: cleaned, typed, and de-duplicated; mapped GTFS stops to SA2; intersected school catchments; fetched NSW POIs via API.
  • Engineered four indicators per SA2: retail business density, public transport stops, school catchment coverage, and essential POIs.
  • Standardized indicators with z-scores and aggregated; applied a sigmoid to bound the final score to [0,1]; excluded tiny-pop SA2s for stability.
  • Produced choropleths and ranked bar charts to surface spatial inequality; compared dispersion within SA4 groups.
  • Checked relationship with income (weak/slightly negative); used Lasso + OLS for selection & interpretation; retained predictors after CV.
  • Ran a rank-based composite as a robustness check to mitigate outlier inflation and improve policy communication.

Reflection

Two choices made the work robust and explainable: (1) keeping geospatial logic inside PostGIS (indexes, ST_Intersects/Contains, and consistent SRIDs) and (2) separating indicator engineering from scoring so we could swap normalization (z-score vs. rank) without breaking the pipeline. The z-score + sigmoid path surfaced contrast clearly but can inflate extremes; the rank-based variant, while simpler, improved stability and policy communication. Model validation reminded us that a single composite index rarely “explains” socioeconomic outcomes—Lasso/OLS helped quantify limits and justify future variables (e.g., housing cost, land use). If iterating, I’d expand indicators, add time dynamics for “access volatility,” and publish a policy brief pairing low-scoring SA2s with actionable levers.