如何使用keras实现多标签分类神经网络
我正在尝试使用Keras实现一个神经网络,该网络涉及一个涉及多标签分类的问题.我知道解决该问题的一种方法是将其转换为几个二进制分类问题.我已经实现了其中之一,但是不确定如何继续进行其他工作,主要是如何将它们结合起来?我的数据集有5个输入变量和5个标签.通常,单个数据样本将具有1-2个标签.很少有两个以上的标签.
I am attempting to implement a neural network using Keras with a problem that involves multilabel classification. I understand that one way to tackle the problem is to transform it to several binary classification problems. I have implemented one of these, but am not sure how to proceed with the others, mostly how do I go about combining them? My data set has 5 input variables and 5 labels. Generally a single sample of data would have 1-2 labels. It is rare to have more than two labels.
这是我的代码(感谢machinelearningmastery.com):
Here is my code (thanks to machinelearningmastery.com):
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("Realdata.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:5].astype(float)
Y = dataset[:,5]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(5, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
scores = model.evaluate(X, encoded_Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
#Make predictions....change the model.predict to whatever you want instead of X
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)
return model
# evaluate model with standardized dataset
estimator = KerasClassifier(build_fn=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
您所指的方法是一对多策略用于多标签分类.但是,当使用神经网络时,具有5个标签的多标签分类问题的最简单解决方案是使用具有5个输出节点的单个模型.与喀拉拉邦:
The approach you are referring to is the one-versus-all or the one-versus-one strategy for multi-label classification. However, when using a neural network, the easiest solution for a multi-label classification problem with 5 labels is to use a single model with 5 output nodes. With keras:
model = Sequential()
model.add(Dense(5, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(5, kernel_initializer='normal', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='sgd')
您可以将训练标签提供为长度为5的二进制编码向量.例如,对应于2类和3类的示例将具有标签 [0 1 1 0 0]
.
You can provide the training labels as binary-encoded vectors of length 5. For instance, an example that corresponds to classes 2 and 3 would have the label [0 1 1 0 0]
.