Applied Machine Learning · Computer Vision · Pathology Image Classification
This project compares five machine learning models for classifying tumour cells versus immune cells in H&E pathology image patches. The client scenario was a pathologist who needed a reliable, interpretable, and clinically useful classifier. I evaluated models from a simple pixel baseline to feature-engineered and deep learning approaches, including Pixel KNN, HOG + Random Forest, Colour Histogram + SVM, CNN, and ResNet50 transfer learning. The final recommended model was Colour Histogram + SVM, which achieved the strongest performance while remaining lightweight and interpretable.
This project helped me understand that stronger models are not always more complex models. The most important insight was that H&E staining already encodes biological information through colour: haematoxylin stains nuclei purple-blue and eosin stains cytoplasm pink. By translating this domain knowledge into HSV colour histogram features, the SVM outperformed both CNN and ResNet50. This taught me that in small medical imaging datasets, interpretable and biologically motivated feature engineering can sometimes outperform end-to-end deep learning. The project also strengthened my ability to communicate model trade-offs to a non-technical client, especially around sensitivity, false negatives, interpretability, and deployment risk.