Classification is a supervised learning task where the variable we are trying to predict is discrete, whether that is binary or categorical.
Some examples of discrete data include:
Whether an email is spam or not (binary)
Whether an outcome is successful or not (binary)
Which of nine numeric digits is represented by some handwriting (categorical)
Which of three families a given penguin is likely to be a member of (categorical)
In this chapter, we will explore different classification models, and introduce key performance metrics used to evaluate the effectiveness of classification models.
15.1 Classification Models
Classification Models in Python:
LogisticRegression from sklearn (NOTE: this is a classification model, not a regression model)
Pros: Provides a quick and simple measure of model performance.
Cons: Can be misleading when the classes are imbalanced (e.g. rare event classification), as it does not differentiate between the types of errors (false positives vs. false negatives).
15.2.2 Precision, Recall, and F1 Score
Precision measures the accuracy of positive predictions, reflecting the proportion of true positives among all instances predicted as positive.
Pros: Useful in scenarios where false positives are costly (e.g. spam detection).
Cons: Does not account for false negatives, making it less informative in cases where missing positives is a concern.
Recall, or True Positive Rate, measures the model’s ability to identify all positive instances, reflecting the proportion of true positives among actual positive instances.
Pros: Offers a balance between precision and recall, particularly useful when there is an uneven class distribution.
Cons: Can be less interpretable alone, especially if more weight should be given to precision or recall individually.
15.2.3 ROC-AUC
ROC stands for the Receiver Operating Characteristic.
The ROC AUC score is the area under the ROC curve plotted with True Positive Rate on the y-axis and False Positive Rate on the x-axis, often computed numerically as there’s no closed-form formula.
15.2.4 Confusion Matrix
In addition to using metrics to evaluate classification results, we can use additional techniques such as a confusion matrix to show how many observations were properly classified vs mis-classified.
15.3 Classification Metrics in Python
For convenience, we will generally prefer to use classification metric functions from the sklearn.metrics submodule:
We also have access to the classification_report function which provides all of these metrics in a single report:
from sklearn.metrics import classification_reportprint(classification_report(y_true, y_pred))
In addition to these metrics, we also have evaluation tools such as the confusion_matrix function:
from sklearn.metrics import confusion_matrixconfusion_matrix(y_true, y_pred)
When using these functions, we pass in the actual values (y_true), as well as the predicted values (y_pred), We take these values from the training set to arrive at training metrics, or from the test set to arrive at test metrics.
Here is a helper function for visualizing the results of a confusion matrix, using a color-coded heatmap:
Code
from sklearn.metrics import confusion_matriximport plotly.express as pxdef plot_confusion_matrix(y_true, y_pred, height=450, showscale=False, title=None, subtitle=None):# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html# Confusion matrix whose i-th row and j-th column# ... indicates the number of samples with# ... true label being i-th class (ROW)# ... and predicted label being j-th class (COLUMN) cm = confusion_matrix(y_true, y_pred) class_names =sorted(y_test.unique().tolist()) cm = confusion_matrix(y_test, y_pred, labels=class_names) title = title or"Confusion Matrix"if subtitle: title +=f"<br><sup>{subtitle}</sup>" fig = px.imshow(cm, x=class_names, y=class_names, height=height, labels={"x": "Predicted", "y": "Actual"}, color_continuous_scale="Blues", text_auto=True, ) fig.update_layout(title={'text': title, 'x':0.485, 'xanchor': 'center'}) fig.update_coloraxes(showscale=showscale) fig.show()
Finally, the ROC-AUC score:
from sklearn.metrics import roc_auc_score# get "logits" (predicted probabilities for each class)y_pred_proba = model.predict_proba(x_test)# for multi-class, pass all probas and use "ovr" (one vs rest)roc_auc = roc_auc_score(y_test, y_pred_proba, multi_class="ovr")print("ROC-AUC:", roc_auc)
Helper function for ROC-AUC for binary or multi-class classification:
from sklearn.metrics import roc_auc_scoredef compute_roc_auc_score(y_test, y_pred_proba, is_multiclass=True):"""NOTE: roc_auc_score uses average='macro' by default"""if is_multiclass:# all classes (for multi-class), with "one-versus-rest" strategyreturn roc_auc_score(y_true=y_test, y_score=y_pred_proba, multi_class="ovr")else:# positive class (for binary classification) y_pred_proba_pos = y_pred_proba[:,1]return roc_auc_score(y_true=y_test, y_score=y_pred_proba_pos)