WEKA分类的可能性

问题描述：

我想知道WEKA中是否有一种方法可以输出许多最佳猜测"进行分类.

I would like to know if there is a way in WEKA to output a number of 'best-guesses' for a classification.

我的场景是:例如，我使用交叉验证对数据进行分类，然后在weka的输出上得到如下信息:这是对该实例进行分类的3个最佳方法.我想要的是，即使实例未正确分类，我也会得到该实例的3个或5个最佳猜测的输出.

My scenario is: I classify the data with cross-validation for instance, then on weka's output I get something like: these are the 3 best-guesses for the classification of this instance. What I want is like, even if an instance isn't correctly classified i get an output of the 3 or 5 best-guesses for that instance.

示例:

类别:A，B，C，D，E 实例:1 ... 10

Classes: A,B,C,D,E Instances: 1...10

输出将是: 实例1有90％的人可能是A类，有75％的人是B类，有60％的人是C类.

And output would be: instance 1 is 90% likely to be class A, 75% likely to be class B, 60% like to be class C..

谢谢.

答

Weka的API有一个称为Classifier.distributionForInstance()的方法，可用于获取分类预测分布.然后，您可以通过降低概率来对分布进行排序，以获得前N个预测.

Weka's API has a method called Classifier.distributionForInstance() tha can be used to get the classification prediction distribution. You can then sort the distribution by decreasing probability to get your top-N predictions.

下面是一个打印输出的函数:(1)测试实例的地面真相标签； (2)来自classifyInstance()的预测标签； (3)来自distributionForInstance()的预测分布.我已经在J48上使用了它，但是它应该与其他分类器一起使用.

Below is a function that prints out: (1) the test instance's ground truth label; (2) the predicted label from classifyInstance(); and (3) the prediction distribution from distributionForInstance(). I have used this with J48, but it should work with other classifiers.

输入参数是序列化的模型文件(您可以在模型训练阶段创建并应用-d选项)和ARFF格式的测试文件.

The inputs parameters are the serialized model file (which you can create during the model training phase and applying the -d option) and the test file in ARFF format.

public void test(String modelFileSerialized, String testFileARFF) 
    throws Exception
{
    // Deserialize the classifier.
    Classifier classifier = 
        (Classifier) weka.core.SerializationHelper.read(
            modelFileSerialized);

    // Load the test instances.
    Instances testInstances = DataSource.read(testFileARFF);

    // Mark the last attribute in each instance as the true class.
    testInstances.setClassIndex(testInstances.numAttributes()-1);

    int numTestInstances = testInstances.numInstances();
    System.out.printf("There are %d test instances\n", numTestInstances);

    // Loop over each test instance.
    for (int i = 0; i < numTestInstances; i++)
    {
        // Get the true class label from the instance's own classIndex.
        String trueClassLabel = 
            testInstances.instance(i).toString(testInstances.classIndex());

        // Make the prediction here.
        double predictionIndex = 
            classifier.classifyInstance(testInstances.instance(i)); 

        // Get the predicted class label from the predictionIndex.
        String predictedClassLabel =
            testInstances.classAttribute().value((int) predictionIndex);

        // Get the prediction probability distribution.
        double[] predictionDistribution = 
            classifier.distributionForInstance(testInstances.instance(i)); 

        // Print out the true label, predicted label, and the distribution.
        System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=", 
                          i, trueClassLabel, predictedClassLabel); 

        // Loop over all the prediction labels in the distribution.
        for (int predictionDistributionIndex = 0; 
             predictionDistributionIndex < predictionDistribution.length; 
             predictionDistributionIndex++)
        {
            // Get this distribution index's class label.
            String predictionDistributionIndexAsClassLabel = 
                testInstances.classAttribute().value(
                    predictionDistributionIndex);

            // Get the probability.
            double predictionProbability = 
                predictionDistribution[predictionDistributionIndex];

            System.out.printf("[%10s : %6.3f]", 
                              predictionDistributionIndexAsClassLabel, 
                              predictionProbability );
        }

        o.printf("\n");
    }
}

相关推荐