Weka打印稀疏Arff文件

问题描述:

我正在尝试arf文件的稀疏表示,如此处所示。在我的程序中,我可以打印类标签 B,但由于某种原因它不能打印 A。

I was trying out the sparse representation of the arff file as shown here. In my program I am able to print the the class label "B" but for some reason it is not printing "A".

    attVals = new FastVector();
    attVals.addElement("A");
    attVals.addElement("B");
    atts.addElement(new Attribute("class", attVals));

    vals[index] = attVals.indexOf("A");

程序的输出类似于-

 {0 6,2 8}      ---  I should get {0 6,2 8,3 A}

但是当我这样做

vals[index] = attVals.indexOf("B");

我得到适当的输出-

 {0 6,2 8,3 B}

出于某些原因它没有采用索引0。有人可以告诉我为什么会这样吗?

For some reason it is not taking the index 0. Can someone tell me why this is happening?

这是一个非常普遍的问题。根据定义,稀疏格式不会存储0个值。

This is a very popular problem. The Sparse format by definition does not store 0 values.

Weka ARFF格式页面明确指出:

Weka ARFF format page clearly says that:


警告:有一个从具有字符串属性的
数据集中保存SparseInstance对象的已知问题。在Weka中,字符串和名义数据
的值存储为数字;这些数字充当可能属性值的
数组的索引(这非常有效)。但是,第一个字符串值
被分配了索引0:这意味着,
在内部,该值存储为0。当SparseInstance被写入
时,内部值为0的字符串实例被存储。不会输出,因此
的字符串值会丢失(并且再次读取arff文件时,
的默认值0是其他字符串值的索引,因此
属性值似乎会更改)。要解决此问题,请在索引0处添加
虚拟字符串值,该值在声明可能会在SparseInstance对象
中使用并保存为稀疏ARFF文件的
字符串属性时永远不会使用

Warning: There is a known problem saving SparseInstance objects from datasets that have string attributes. In Weka, string and nominal data values are stored as numbers; these numbers act as indexes into an array of possible attribute values (this is very efficient). However, the first string value is assigned index 0: this means that, internally, this value is stored as a 0. When a SparseInstance is written, string instances with internal value 0 are not output, so their string value is lost (and when the arff file is read again, the default value 0 is the index of a different string value, so the attribute value appears to change). To get around this problem, add a dummy string value at index 0 that is never used whenever you declare string attributes that are likely to be used in SparseInstance objects and saved as Sparse ARFF files.

您必须首先放置一个哑属性。只需将代码修改为:

You have to put a dummy attribute in the first place. Just modify your code to:

attVals = new FastVector();
attVals.addElement("dummy");
attVals.addElement("A");
attVals.addElement("B");

让我知道您是否需要其他帮助。

Let me know if you need any further help.