理解多分类中的 mAP | Patrick's Space

理解多分类中的 mAP

patrickcty 3月 11, 2020

在多分类任务中，有时候会用 mAP（Mean Average Precision）来表示分类的准确程度，如 VOC。其原因在于 mAP 能很好地评价分类的排序，而通常用的 accuracy 则往往会被最多的那个类别支配。

要计算多分类的 mAP，则要先计算各个类别的 AP。

VOC 中 AP 的计算方法

总的来说，就是对于某个类别，得到 n 个对该类别的预测概率，按照概率从大到小的顺序进行排列，然后对于 k∈1~n，求每个 k 对应的 Precision 和 Recall 值，对于每个 Recall 值，得到一个 Precision 值（==保证 P-R 曲线单调非递增==），将 n 个 Precision 取平均之后即为 AP 的值。要注意的是，这里不涉及到@k，因为总是为计算所有 n 个预测的结果。

对于一个四分类问题，已知标签和预测结果如下：

y_true = np.array([[2], [1], [0], [3], [0], [1]]).astype(np.int64)
y_pred = np.array([[0.1, 0.2, 0.6, 0.1],
                   [0.8, 0.05, 0.1, 0.05],
                   [0.3, 0.4, 0.1, 0.2],
                   [0.6, 0.25, 0.1, 0.05],
                   [0.1, 0.2, 0.6, 0.1],
                   [0.9, 0.0, 0.03, 0.07]]).astype(np.float32)

以类别 3 为例，六次预测给出的概率经过排序后为[0.2 0.1 0.1 0.07 0.05 0.05]，对应位置预测结果为[0. 0. 0. 0. 0. 1.]，0 表示预测错误，1 表示预测正确，那么可以列出来一个表：

top-k	Precision	Recall
1	0/1	0/1
2	0/2	0/1
3	0/3	0/1
4	0/4	0/1
5	0/5	0/1
6	1/6	1/1

在确保 P-R 曲线单调递减的情况下，求各个 Recall 对应的 Precision 的均值，在这里是 AP = (1/6) / 1 = 1/6

具体的内容参考这一篇。

同理可以求出来各个类别的 AP 为：[1/3, 1/3, 1.0, 1/6]，求均值后得到 MAP = 0.458。

一个 numpy 的实现为，来自这个 issue：

# y_true is one-hot
 _, classes = y_true.shape
    
average_precisions = []

for index in range(classes):
        # 得到从大到小排序后的标签索引
        row_indices_sorted = numpy.argsort(-y_pred[:, index])

        # 重新排列后的标签和预测结果
        y_true_cls = y_true[row_indices_sorted, index]
        y_pred_cls = y_pred[row_indices_sorted, index]

        tp = (y_true_cls == 1)
        fp = (y_true_cls == 0)

        # 每个位置是 top-i 的 fp 和 tp
        fp = numpy.cumsum(fp)
        tp = numpy.cumsum(tp)

        # 一共有多少预测正确的标签
        npos = numpy.sum(y_true_cls)

        # top-i 的 recall
        rec = tp*1.0 / npos

        # avoid divide by zero in case the first detection matches a difficult
        # ground truth
        # top-i 的 precision
        prec = tp*1.0 / numpy.maximum((tp + fp), numpy.finfo(numpy.float64).eps)

        # 加上头和尾
        mrec = numpy.concatenate(([0.], rec, [1.]))
        mpre = numpy.concatenate(([0.], prec, [0.]))

        # compute the precision envelope
        # 保证 P-R 曲线单调递减
        for i in range(mpre.size - 1, 0, -1):
            mpre[i - 1] = numpy.maximum(mpre[i - 1], mpre[i])

        # to calculate area under PR curve, look for points
        # where X axis (recall) changes value
        i = numpy.where(mrec[1:] != mrec[:-1])[0]

        # and sum (\Delta recall) * prec
        # 相当于求每个 recall 值对应的 precision 的均值
        average_precisions.append(numpy.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]))

TnesorFlow 中的 AP

TensorFlow 中使用以下方法来计算 mAP

tf.compat.v1.metrics.average_precision_at_k(
    labels, predictions, k, weights=None, metrics_collections=None,
    updates_collections=None, name=None
)

但是这里并不是按照分类的标准类计算 mAP，而是对于检索来计算 mAP，而对于检索来说，总是==要考虑@k==，也就是考虑检索出来前 k 个的结果。

还是对于上面的例子，TF 中将 pred 看成了 6 次检索，每次检索有一个待检索对象（真值标签），检索产生四个概率值，这四个概率值的和为一。不过要注意的是，通常检索的时候待检索对象往往大于一，并且检索所产生的多个概率值的和不一定为一。

对于以下这一个预测结果，当 k = 1 的时候，0.8 对应的是标签 0，因此 Precision = 0/1，Recall = 0/1；k = 2，0.1 对应的是标签 2，因此 Precision = 0/2，Recall = 0/1；k = 3，0.05 对应的是标签 1，因此 Precision = 1/3，Recall = 1/1；k = 4，此时已经全部检索到了，因此 Precision = 1/3，Recall = 1。

pred = [0.8, 0.05, 0.1, 0.05]
true = [1]

其他预测结果同理可求，因此 mAP@3 = (1 + 1/3 + 1/2 + 0 + 1/3 + 0) / 6 = 0.3611。