DATA3404: Scalable Data Management

University of Sydney 2026 S1
In Progress
logo
Big Data
Database Systems
Distributed Data

Explores large-scale data management systems, database internals, query optimisation, distributed platforms, and performance tuning for big data.

Learning Outcomes

  • Platform tuning: Demonstrate experience with using and tuning data science platforms such as Apache Spark.
  • Physical data organisation: Understand different physical data organisations, including data partitioning and data replication.
  • Indexing structures: Understand disk-based indexing structures such as B-Trees, extensible hashing, and bitmap indexes.
  • Query optimisation: Understand the principles of query processing and query optimisation.
  • Distributed platforms: Understand the principles of distributed data science platforms.
  • Sharding and replication: Understand data sharding algorithms and data replication protocols.
  • Physical design: Make effective physical data design decisions.
  • Performance tuning: Identify performance problems and effectively tune the performance of distributed data processing systems.

Takeaways

Coming soon.