Confusion matrix
While learning Machine Learning, it is common to assume that accuracy is the ultimate measure of model performance.
However, I learned the hard way that accuracy alone can be misleading. Looking at accuracy over each training iteration helps determine whether a model is truly learning or simply memorizing / guessing.
To correctly evaluate a classification model, we need to understand different performance metrics derived from the confusion matrix.
Confusion Matrix Overview
| Â | Predicted Positive | Predicted Negative |
|---|---|---|
| Actual Positive | True Positive (TP) | False Negative (FN) |
| Actual Negative | False Positive (FP) | True Negative (TN) |
Key Metrics
Accuracy
\[\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}\]Measures how many predictions are correct overall.
Precision
\[\text{Precision} = \frac{TP}{TP + FP}\]Out of everything the model predicted as positive, how many were actually positive?
Recall (Sensitivity / True Positive Rate)
\[\text{Recall} = \frac{TP}{TP + FN}\]Out of all actual positive cases, how many did the model correctly identify?
F1-Score
\[\text{F1} = 2 \times \frac{Precision \times Recall}{Precision + Recall}\]The harmonic mean of Precision and Recall — useful when the dataset is imbalanced.
Why These Metrics Matter
- Accuracy can lie in cases with imbalanced datasets (e.g., 95% accuracy but only predicting the majority class).
- Precision vs Recall trade-off tells us whether the model is cautious or aggressive.
- F1-score balances both, giving a single metric for comparison.
Takeaway
Evaluating an ML model requires more than just accuracy.
Understanding precision, recall, and F1-score helps determine whether a model is reliable, fair, and actually learning.