2021 · Machine Learning coursework
Car Price Prediction
PCA-driven feature engineering plus a head-to-head comparison of regularized regressors, neural networks, and classifiers, evaluated with cross-validation and statistical tests.
- Python
- scikit-learn
- PyTorch
- PCA
Overview
A machine learning coursework project on predicting car prices from a real tabular dataset. The point wasn’t to find a single best model. It was to walk the full methodological pipeline: feature engineering, several model families in parallel, and statistically grounded evaluation.
Approach
- Feature engineering with PCA. Reduced and decorrelated the input space before model fitting, both as a preprocessing step and as a way to see which directions in feature space actually carried signal.
- Regularized regression. Built and tuned regressors with L1 / L2 regularization to control overfitting on the high-dimensional input.
- Neural networks. Trained feedforward networks for the regression target and compared them against the linear baselines.
- Classification framing. Reframed the problem as a discrete one (price band) and trained logistic regression and KNN classifiers.
- Evaluation. k-fold cross-validation across all model families, with paired statistical tests to compare performance.
What I learned
The interesting result of projects like this is rarely “model X won.” It’s how much the variance across folds dwarfs the average gap between methods. Statistical testing makes it explicit: most of the time the “best” model is only marginally better than the second-best, and you need paired tests or resampling to know whether a difference is real or just sample noise. That perspective stuck with me far more than any particular set of weights.