插入符号和GBM:任务1失败-“参数表示行数不同"
我正在尝试使用脱字符运行GBM,并使用以下代码:
I'm trying to run a GBM with caret with the code below:
library(caret)
library(doParallel)
detectCores()
registerDoParallel(detectCores() - 1)
set.seed(668)
in.train <- createDataPartition(y = dat$target, p = 0.80, list = T)
ctrl <- trainControl(method = 'cv', number = 2, classProbs = T, verboseIter = T,
summaryFunction = LogLossSummary2)
gbm.grid <- expand.grid(interaction.depth = 10,
n.trees = (2:7) * 50,
shrinkage = 0.1)
Sys.time()
set.seed(1234)
gbm.fit <- train(target ~., data = otto.new[in.train, ],
method = 'gbm', distribution = 'multinomial',
metric = 'LogLoss', maximize = F,
tuneGrid = gbm.grid, trControl = ctrl,
n.minobsinnode = 4, bag.fraction = 0.9)
Sys.time()
但是,它失败并显示以下错误:
However, it fails with the error:
Error in { :
task 1 failed - "arguments imply differing number of rows: 0, 24754"
In addition: Warning messages:
1: package ‘gbm’ was built under R version 3.0.3
2: package ‘survival’ was built under R version 3.0.3
3: package ‘plyr’ was built under R version 3.0.3
这是我的会话信息:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] splines parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.8.1 gbm_2.1.1 survival_2.38-1 doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 data.table_1.9.4
[8] caret_6.0-41 ggplot2_1.0.1 Revobase_7.1.0 RevoMods_7.1.0 RevoScaleR_7.1.0 lattice_0.20-27 rpart_4.1-5
loaded via a namespace (and not attached):
[1] BradleyTerry2_1.0-6 brglm_0.5-9 car_2.0-25 chron_2.3-45 class_7.3-12 codetools_0.2-11
[7] colorspace_1.2-6 compiler_3.0.2 digest_0.6.8 e1071_1.6-4 grid_3.0.2 gtable_0.1.2
[13] gtools_3.4.1 lme4_1.1-7 MASS_7.3-37 Matrix_1.1-5 mgcv_1.8-5 minqa_1.2.4
[19] munsell_0.4.2 nlme_3.1-120 nloptr_1.0.4 nnet_7.3-9 pbkrtest_0.4-2 proto_0.3-10
[25] quantreg_5.11 Rcpp_0.11.5 reshape2_1.4.1 scales_0.2.4 SparseM_1.6 stringr_0.6.2
[31] tools_3.0.2
我注意到这个问题是间歇性发生的,并且当我确保我的数据集是k倍的倍数时,这个问题似乎有所减少.(在上述情况下,我的数据集有49506行).尽管如此,它似乎时不时地冒出来.有没有其他人遇到过这种情况,并找到了防止这种情况的方法?
I've noticed that this issue happens intermittently and seems to be reduced when I ensure that my dataset is a multiple of k-folds. (In the case above, my dataset has 49506 rows). Nonetheless, it seems to crop up every now and then. Has anyone else encountered this and come across a way to prevent it?
我遇到了同样的问题,然后我意识到我的一行中有一个"NA".该模型没有对此做出预测,因此当我运行predict()时,该模型缺少1行.在浪费了两天的时间后,我估算了NA并重新运行了完全相同的脚本.工作正常.因此,请尝试在数据中插入NA或null.(顺便说一句:我也读了一些"null"作为因素,所以也请寻找那些).请让我们知道它是否为您解决了问题.
I was facing the same problem then I realized that one of my rows had an "NA" . The model did not make a prediction for it and thus it was missing 1 row in when I ran the predict(). After 2 days worth of wasted time, I imputed the NA and reran the exact same script. Worked fine. So please try imputing the NAs or nulls in your data. (BTW: I had some "null" also which it was reading as factors so please look for those as well) . Please let us know if it solved the problem for you or not.