Keras:如何为多标签分类计算准确性?

问题描述:

我正在做有毒评论文本分类Kaggle挑战.有6个类别:['threat', 'severe_toxic', 'obscene', 'insult', 'identity_hate', 'toxic'].注释可以是这些类的多个,所以这是一个多标签分类问题.

I'm doing the Toxic Comment Text Classification Kaggle challenge. There are 6 classes: ['threat', 'severe_toxic', 'obscene', 'insult', 'identity_hate', 'toxic']. A comment can be multiple of these classes so it's a multi-label classification problem.

我用Keras构建了一个基本的神经网络,如下所示:

I built a basic neural network with Keras as follows:

model = Sequential()
model.add(Embedding(10000, 128, input_length=250))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(len(classes), activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

我运行以下行:

model.fit(X_train, train_y, validation_split=0.5, epochs=3)

并在3个纪元后获得99.11%的准确性.

and get 99.11% accuracy after 3 epochs.

但是,99.11%的准确度比最好的Kaggle提交要高得多.这让我觉得我要么是(a)过拟合,要么是b)滥用Keras的准确性.

However, 99.11% accuracy is a good bit higher than the best Kaggle submission. This makes me think I'm either (possibly both) a) overfitting or b) misusing Keras's accuracy.

1)当我将50%的数据用作验证拆分并且仅使用3个历元时,似乎有点难以适应.

1) Seems a bit hard to overfit when I'm using 50% of my data as a validation split and only 3 epochs.

2)此处的准确性仅仅是模型获取每个类的时间正确的百分比吗?

2) Is accuracy here just the percentage of the time the model gets each class correct?

因此,如果我输出[0, 0, 0, 0, 0, 1]并且正确的输出是[0, 0, 0, 0, 0, 0],我的精度将是5/6?

So if I output [0, 0, 0, 0, 0, 1] and the correct output was [0, 0, 0, 0, 0, 0], my accuracy would be 5/6?

经过一番思考,我觉得这里的accuracy指标只是看我的模型以最高的置信度预测的类,并与实际情况进行比较.

After a bit of thought, I sort of think the accuracy metric here is just looking at the class my model predicts with highest confidence and comparing vs. ground truth.

因此,如果我的模型输出[0, 0, 0.9, 0, 0, 0],它将把索引2(淫秽")的类与真实值进行比较.您认为这是怎么回事吗?

So if my model outputs [0, 0, 0.9, 0, 0, 0], it will compare the class at index 2 ('obscene') with the true value. Do you think this is what's happening?

感谢您提供的任何帮助!

Thanks for any help you can offer!

对于多标签分类,我认为使用sigmoid作为激活而使用binary_crossentropy作为损失是正确的.

For multi-label classification, I think it is correct to use sigmoid as the activation and binary_crossentropy as the loss.

如果输出为稀疏多标签,表示少数正标签,而大多数为负标签,则Keras accuracy指标将被正确预测的负标签夸大.如果我没记错的话,Keras不会选择可能性最高的标签.相反,对于二进制分类,阈值为50%.因此,预测将为[0, 0, 0, 0, 0, 1].如果实际标签为[0, 0, 0, 0, 0, 0],则精度为5/6.您可以通过创建一个始终预测阴性标签并查看准确性的模型来检验该假设.

If the output is sparse multi-label, meaning a few positive labels and a majority are negative labels, the Keras accuracy metric will be overflatted by the correctly predicted negative labels. If I remember correctly, Keras does not choose the label with the highest probability. Instead, for binary classification, the threshold is 50%. So the prediction would be [0, 0, 0, 0, 0, 1]. And if the actual labels were [0, 0, 0, 0, 0, 0], the accuracy would be 5/6. You can test this hypothesis by creating a model that always predicts negative label and look at the accuracy.

如果确实如此,您可以尝试使用其他指标,例如 top_k_categorical_accuracy .

If that's indeed the case, you may try a different metric such as top_k_categorical_accuracy.

我能想到的另一种可能性是您的训练数据.标签y是否以某种方式泄漏"到x中?只是一个疯狂的猜测.

Another remote possibility I can think of is your training data. Are the labels y somehow "leaked" into x? Just a wild guess.