Forecast.svm中的错误:测试数据与模型不匹配

问题描述:

我有一个约500行和170列的数据框.我正在尝试使用e1071软件包中的svm运行分类模型.分类变量称为"SEGMENT",这是一个具有6个级别的因子变量.数据框中还有其他三个因子变量,其余为数字.

I have a data frame of about 500 rows and 170 columns. I am attempting to run a classification model with svm from the e1071 package. The classification variable is called 'SEGMENT', a factor variable with 6 levels. There are three other factor variables in the data frame, and the rest are numeric.

data <- my.data.frame
# Split into training and testing sets, training.data and testing.data
.
.
.
fit <- svm(SEGMENT ~ ., data = training.data, cost = 1, kernel = 'linear', 
+ probability = T, type = 'C-classification')

模型运行良好.

Parameters:
SVM-Type:  C-classification 
SVM-Kernel:  linear 
   cost:  1 
   gamma:  0.0016 

Number of Support Vectors:  77

( 43 2 19 2 2 9 )

Number of Classes:  6 

Levels: 
EE JJ LL RR SS WW

当我尝试在data.testing上测试模型时出现问题,该模型的结构完全,就像训练集一样:

The problem arises when I try to test the model on data.testing, which is structured exactly like the training set:

x <- predict(fit, testing.data, decision.values = T, probability = T)

然后事情就爆发得很壮观:

And then things blow up rather spectacularly:

Error in predict.svm(fit, newdata = testing, decision.values = T, probability = T) : 
test data does not match model !

最欢迎出现想法.

当测试数据和训练数据中的列不同时,会发生这种情况. 尝试str(training.data)& str(testing.data)除了应该预测的变量外,它们应该具有相同的变量. 仅将您要用于预测的因素包括在svm训练模型中.

This happens when the columns in test and train data aren't same. Try str(training.data) & str(testing.data) they should have the same variables except for the one that needs to be predicted. Include only those factors you want to use for prediction in the svm training model.

例如:

fit <- svm(SEGMENT ~ ., data = training.data[,1:6], cost = 1, kernel = 'linear', 
+ probability = T, type = 'C-classification')     


x <- predict(fit, testing.data[,1:5], decision.values = T, probability = T)