Metrics

Sometimes it’s hard to tell the differences between precision, accuracy, recall and so on especially for newbees like me.

But let’s try to distinguish them with stories. In this article, you will see some common used metrics, including:

  • Metrics for binary classification: accuracy, precision, reacall, f1-score and so on

Binary classification

The following metircs for binary classification will be introduced:

  1. accuracy (精度)
  2. error (rate) (错误率)
  3. precision (准确率,查准率)
  4. recall (召回率,查全率)
  5. f1-score
  6. fn-score

(中文名称参考《机器学习》,周志华著)1

First, we should know some terminologies from a table.

Predictiton
Positive Negative
Truth Positive True Positive False Negative
Negative False Positive True Negative

The table above is called as a 2x2 contingency table or confusion matrix (混淆矩阵). 2

  • True Positive (TP): We predict it is positive, and it’s right.
  • True Negative (TN): We predict it is negative, and it’s right.
  • False Positive (FP): We predict it is positive, but it’s wrong. In fact, it’s negative.
  • False Negative (FN): We predict it is negative, but it’s wrong. In fact, it’s positive.

Accuracy

Accuracy is the correctness rate of your prediction.

The number of your prediction:

$$TP+TN+FP+FN$$

The number of your correct prediction:

$$TP+TN$$

So the accuracy is:

$$ accuracy = \frac{TP+TN}{TP+TN+FP+FN} $$

Error (rate)

Error is the wrong rate of your prediction.

So according to the formula of accuracy, we can get the error rate.

$$ error = 1 - accuracy = \frac{FP+FN}{TP+TN+FP+FN} $$

Precision

Precision is the correctness rate of your positive prediction.

Try to understand it use this example from Zhou’s Machine Learnig:

Consider you have trained a model to help you choose good watermelons.
You want to know that the ratio of really good ones in the watermelons choosen by your model.
That is precision.

So the number of positive prediction is:

$$TP+FP$$

The number of correct positive prediction is:

$$TP$$

We can get the precision with:

$$ precision = \frac{TP}{TP+FP} $$

Recall

Recall is the correctness rate of real positive. The focus is not your prediction, is the cases.

This time we try to understand it with an example I saw in Quora.

Consider you have a girl friend and she gave you a birthday surprise ever year in last 10 years.
However, one day, she asks you: sweetie, do you remember all birthday surprises from me? You need to recall all 10 surprises or you are in danger.

In this situation, all you need is to remember all 10 surprises correctly, no matter how many times you guess. You may guess many times before you get the right one of one year so that you can’t get 100% precision.
But that’s ok, finally, you remember all luckily and you get the recall of 100%.

So the number of really positive cases is:

$$TP+FN$$

The number of correct ones among them is:

$$TP$$

The recall is:

$$ recall = \frac{TP}{TP+FN} $$

\(F_1\) score

Usually, we can’t get high precision and high recall simultaneously.

Girl friend example again.
Assume you can’t remember all superises clearly. You need guess to get correct answer. You have two strategies immediately.

  1. Guess as much as you can to make every surprise coverd correctely.
  2. Only tell the surprises you remember and don’t say anything more.

But I think they will both make you in danger.
The frist one get a high recall and the second get a high precision. In fact, you should find a middle point.

F1 score is the harmonic mean of precision and recall:

$$ F_1 = \frac{2 \cdot precision \cdot recall}{precision + recall} $$

\(F_n\) score

Sometimes we probably think precision is more important than recall or recall is more important. And \(F_1\) score doesn’t satisfy us, so we need more general \(F_n\) socre.

This could be understand with two examples.

  1. Considering spam detecting. It’s unberable for us that important mails are detected as spams. So we want all spams we detected are real spams. So get a high precision is more important.
  2. Considering criminal catching. Police officers won’t let every suspicious people go. So get a high recall is more important.

The definition of \(F_n\) score is:

$$ F_n = (1+\beta^2) \cdot \frac{precision \cdot recall}{\beta^2 \cdot precision + recall} $$

Two commonly used F measures are the \(F_2\) measure, which weights recall higher than precision. (More emphasis on \(\frac1{recall}\)).
And the \( F_{0.5} \) measure, which weighs precision higher than recall. (More emphasis on \(\frac1{precision}\))3