One of the most used metrics for evaluating the effectiveness of a classification model is the ROC curve. This manual will help you comprehend the relationship between ROC curves and AUC.
One of the most often employed evaluation measures for gauging the effectiveness of a classification model is the ROC curve, also known as the receiver operating characteristic curve. Sadly, a lot of data scientists merely look at the ROC curves and then quote an AUC value without fully understanding what the AUC value means or how they may use it more efficiently.
They occasionally fail to understand the variety of issues that ROC curves resolve or the various properties of AUC ROC Curve, such as threshold invariance and scale invariance, which imply that the AUC metric is independent of the threshold or the scale of probability.
Due to these features, AUC is highly helpful for assessing binary classifiers because it enables us to do so without taking the classification threshold into account. Because of this, data scientists need to have a solid grasp of both ROC curves and AUC.
What Distinguishes the ROC from the AUC?
So, the first question that occurs to us is, “Why not use a pretty easy statistic like accuracy for a binary classification task?” before we even start learning about ROC curves and AUC. After all, accuracy is only a straightforward calculation of a model’s percentage of accurate predictions.
The response is that since accuracy is neither a threshold-invariant metric nor a scale-invariant metric, it does not accurately describe the nature of a probabilistic classifier. What precisely am I saying? Examples make it simpler to explain.
Why is accuracy not threshold-invariant, for starters?
Assuming a logistic regression classifier threshold of 0.5, what do you estimate the accuracy of this classifier to be?
You should be commended if you answered 50%. The two zeros would be mistaken for ones. This is not a desirable result. Do we really have such a poor classifier? It seems to be based on accuracy as an evaluation criteria. But what if we change the same example’s criterion to 0.75? Now, our classifier is 100 percent accurate.
This should motivate us to think about how we may create an assessment metric that is not dependent on the threshold. We want a statistic that is unaffected by thresholds, in other words.
What causes accuracy to not be scale-invariant?
Let’s try it again using a classifier that predicts various probabilities in the same rank order this time. This shows that even though the probability values fluctuate, the order does not. As a result, while Classifier C predicts on a completely different scale than Classifier B, the prediction rank in Classifier B remains constant. Which of the following is therefore the most successful?
It is clear that each classifier functions largely similarly in each of these cases. In other words, we will achieve 100% accuracy on all of them if we set a threshold of 0.75 for Classifier A, 0.7 for Classifier B, and 68.5 for Classifier C.
The term “scale-invariant property” refers to the characteristic of an evaluation measure having the same value while the rank order is constant. This characteristic makes it possible to compare two different classifiers that predict values on different scales when a classifier predicts a score rather than a probability.
A small historical interesting fact about ROC curves is that during World War II, they were first used for the interpretation of radar data. The US military planned to use radar signals to find Japanese planes after the attacks on Pearl Harbor.
Because they let users choose thresholds for differentiating between positive and negative samples, ROC curves were particularly helpful for this purpose.
The ROC Curve and Its Uses
ROC curves can often be used to establish a threshold value. The classifier’s application will also influence the threshold value. Therefore, even if the FPR is fairly high in the example curve above, you may use a low threshold value like 0.16 since you would want to catch as many positives as possible (i.e., have a high TPR).
This is so you can’t predict “no cancer” for someone who already has cancer. In this situation, a false negative would be very expensive. You will be alright even if a person tests positive who does not have cancer because the cost of a false positive is lower than that of a false negative. For such crucial testing, many hospitals and doctors follow this procedure, and if a patient tests positive, many professionals repeat the same test.
What Is AUC, Exactly?
The AUC is the region that is beneath the ROC Curve. The objective is to maximize this area so that we have the highest TPR and lowest FPR given some threshold. This area is always expressed as a value between 0 and 1 (just as TPR and FPR can both vary from 0 to 1).
If we have predictions and actual y values, we may calculate AUC using the Scikit utility function roc_auc_score(y, preds).
AUC also represents the probability that a classifier would give a randomly selected positive case a higher classification score than a randomly selected negative instance. An AUC of 0.5, for instance, denotes a random probability of 0.5 of a positive case ranking higher than a negative instance. Positive occurrences would always receive greater scores than negative ones in a perfect classifier with an AUC of 1.