What Are Correct Values for Precision and Recall When the Denominators Are Zero?

I did an Internet search for “What are correct values for precision and recall when the denominators equal 0?” and was pointed to a StackExchange page which had been up for over 11 years — and which was somewhat ambiguous. See https://stats.stackexchange.com/questions/8025/what-are-correct-values-for-precision-and-recall-when-the-denominators-equal-0.

The source of the issue is the definitions of FP (false positives) and FN (false negatives).

Years ago, I was taught that TP (true positives) are actual class positive (1) that are predicted correctly. FP (false positives) are actual class positive (1) that are predicted incorrectly. TN (true negatives) are actual class negative (0) that are predicted correctly. FN (false negatives) are actual class negative (0) that are predicted incorrectly. These definitions make sense from an English point of view.

However, a more common definition nowadays is that FP is an actual negative that’s predicted incorrectly, and FN is an actual positive that’s predicted incorrectly. Put another way, FP is the count of those that are “falsely predicted as positive”, and FN is the count of those that are “falsely predicted as negative.”

Precision and recall are defined as:

p (precision) = TP / (TP + FP)
r (recall)    = TP / (TP + FN)

Here’s an example using the modern, more common definitions:

actual  predicted
  0       0        TN  
  0       0        TN
  0       1        FP
  0       1        FP
  1       0        FN
  1       1        TP
  1       1        TP
  1       1        TP

p = TP / (TP + FP) = 3 / (3+2) = 3/5

r = TP / (TP + FN) = 3 / (3+1) = 3/4

OK. But what if either denominator is 0. For precision if TP + FP = 0, then TP = 0 and also FP = 0. The only way this could happen is if all predictions are 0, for example:

actual  predicted
  0       0        TN  
  0       0        TN
  0       0        TN
  0       0        TN
  1       0        FN
  1       0        FN
  1       0        FN
  1       0        FN

In other words, all predictions are negative. This should raise a warning that the prediction system could be broken. The StackExchange page states, “If (true positives + false positives) = 0 then all cases have been predicted to be negative.” This is true if the more common definitions of FP and FN are used.

Now for recall, if TP + FN = 0, then TP = 0 and also FN = 0. This could happen like this:

actual  predicted
  0       0        TN  
  0       0        TN
  0       1        FP
  0       1        FP
  0       0        TN
  0       0        TN
  0       0        TN
  0       0        TN

All actual labels are negative. If any actual data is positive, than there’d be at least one TP (if predicted correctly) or one FN (if predicted incorrectly). This scenario (all positive data) should never happen. The StackExchange page states, “If (true positives + false negatives) = 0 then no positive cases in the input data.”

Bottom line:

Using the common definitions of FP and FN:

1.) If TP + FP = 0, a warning should be printed that the prediction system is likely flawed because it always predicts class negative.

2.) If TP + FN = 0, a warning should be printed that the data is flawed because there are no actual positives.

Do not believe everything you read on the Internet without questioning it.



I’m not sure why, but I always associate the word “precision” with high quality wrist watches. Here are three watches that don’t immediately bring the idea of precision to mind. Left: This one is a-mazing. Center: I don’t know why all wrist watches don’t have built-in lighters. What could go wrong? Right: Very stylish, but the vacuum tubes might not be necessary.


This entry was posted in Machine Learning. Bookmark the permalink.

1 Response to What Are Correct Values for Precision and Recall When the Denominators Are Zero?

  1. Thorsten Kleppe's avatar Thorsten Kleppe says:

    This is my latest fully clickable version, and it is still confusing for me.

    https://raw.githubusercontent.com/grensen/ML-Art/master/2023/gg_2023_confusion_matrix.png

    To avoid dividing by 0, I added 1 to each side and then calculated – 1 again. This was somehow the most okay way, well… idk.

Comments are closed.