WEKA分类的可能性
我想知道WEKA中是否有一种方法可以输出许多最佳猜测"进行分类.
I would like to know if there is a way in WEKA to output a number of 'best-guesses' for a classification.
我的场景是:例如,我使用交叉验证对数据进行分类,然后在weka的输出上得到如下信息:这是对该实例进行分类的3个最佳方法.我想要的是,即使实例未正确分类,我也会得到该实例的3个或5个最佳猜测的输出.
My scenario is: I classify the data with cross-validation for instance, then on weka's output I get something like: these are the 3 best-guesses for the classification of this instance. What I want is like, even if an instance isn't correctly classified i get an output of the 3 or 5 best-guesses for that instance.
示例:
类别:A,B,C,D,E 实例:1 ... 10
Classes: A,B,C,D,E Instances: 1...10
输出将是: 实例1有90%的人可能是A类,有75%的人是B类,有60%的人是C类.
And output would be: instance 1 is 90% likely to be class A, 75% likely to be class B, 60% like to be class C..
谢谢.
Weka的API有一个称为Classifier.distributionForInstance()的方法,可用于获取分类预测分布.然后,您可以通过降低概率来对分布进行排序,以获得前N个预测.
Weka's API has a method called Classifier.distributionForInstance() tha can be used to get the classification prediction distribution. You can then sort the distribution by decreasing probability to get your top-N predictions.
下面是一个打印输出的函数:(1)测试实例的地面真相标签; (2)来自classifyInstance()的预测标签; (3)来自distributionForInstance()的预测分布.我已经在J48上使用了它,但是它应该与其他分类器一起使用.
Below is a function that prints out: (1) the test instance's ground truth label; (2) the predicted label from classifyInstance(); and (3) the prediction distribution from distributionForInstance(). I have used this with J48, but it should work with other classifiers.
输入参数是序列化的模型文件(您可以在模型训练阶段创建并应用-d选项)和ARFF格式的测试文件.
The inputs parameters are the serialized model file (which you can create during the model training phase and applying the -d option) and the test file in ARFF format.
public void test(String modelFileSerialized, String testFileARFF)
throws Exception
{
// Deserialize the classifier.
Classifier classifier =
(Classifier) weka.core.SerializationHelper.read(
modelFileSerialized);
// Load the test instances.
Instances testInstances = DataSource.read(testFileARFF);
// Mark the last attribute in each instance as the true class.
testInstances.setClassIndex(testInstances.numAttributes()-1);
int numTestInstances = testInstances.numInstances();
System.out.printf("There are %d test instances\n", numTestInstances);
// Loop over each test instance.
for (int i = 0; i < numTestInstances; i++)
{
// Get the true class label from the instance's own classIndex.
String trueClassLabel =
testInstances.instance(i).toString(testInstances.classIndex());
// Make the prediction here.
double predictionIndex =
classifier.classifyInstance(testInstances.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
testInstances.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
classifier.distributionForInstance(testInstances.instance(i));
// Print out the true label, predicted label, and the distribution.
System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=",
i, trueClassLabel, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
testInstances.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
}
o.printf("\n");
}
}