如何创建循环(水平不与参考重叠时)
我已经用R编写了一些代码.此代码获取一些数据并将其分为训练集和测试集.然后,我拟合了生存随机森林".训练集上的模型.之后,我使用该模型预测测试集中的观测结果.
I have written some code in R. This code takes some data and splits it into a training set and a test set. Then, I fit a "survival random forest" model on the training set. After, I use the model to predict observations within the test set.
由于我要处理的问题类型(生存分析"),必须为每个唯一时间"创建一个混淆矩阵.(在文件"unique.death.time"内部).对于在每个唯一时间制作的每个混淆矩阵,我对相应的敏感度"感兴趣.值(例如,敏感性_1001,敏感性_2005等).我正在尝试获取所有这些敏感度值:我想用它们作图(对比唯一的死亡时间)并确定平均敏感度值.
Due to the type of problem I am dealing with ("survival analysis"), a confusion matrix has to be made for each "unique time" (inside the file "unique.death.time"). For each confusion matrix made for each unique time, I am interested in the corresponding "sensitivity" value (e.g. sensitivity_1001, sensitivity_2005, etc.). I am trying to get all these sensitivity values : I would like to make a plot with them (vs unique death times) and determine the average sensitivity value.
为此,我需要在"unique.death.times"中重复计算每个时间点的灵敏度.我尝试手动执行此操作,这需要很长时间.
In order to do this, I need to repeatedly calculate the sensitivity for each time point in "unique.death.times". I tried doing this manually and it is taking a long time.
有人可以告诉我如何使用循环"吗?
Could someone please show me how to do this with a "loop"?
我在下面发布了我的代码:
I have posted my code below:
#load libraries
library(survival)
library(data.table)
library(pec)
library(ranger)
library(caret)
#load data
data(cost)
#split data into train and test
ind <- sample(1:nrow(cost),round(nrow(cost) * 0.7,0))
cost_train <- cost[ind,]
cost_test <- cost[-ind,]
#fit survival random forest model
ranger_fit <- ranger(Surv(time, status) ~ .,
data = cost_train,
mtry = 3,
verbose = TRUE,
write.forest=TRUE,
num.trees= 1000,
importance = 'permutation')
#optional: plot training results
plot(ranger_fit$unique.death.times, ranger_fit$survival[1,], type = 'l', col = 'red') # for first observation
lines(ranger_fit$unique.death.times, ranger_fit$survival[21,], type = 'l', col = 'blue') # for twenty first observation
#predict observations test set using the survival random forest model
ranger_preds <- predict(ranger_fit, cost_test, type = 'response')$survival
ranger_preds <- data.table(ranger_preds)
colnames(ranger_preds) <- as.character(ranger_fit$unique.death.times)
从这里,以前帖子()建议如何创建循环:
From here, another user (Justin Singh) from a previous post (R: how to repeatedly "loop" the results from a function?) suggested how to create a loop:
sensitivity <- list()
for (time in names(ranger_preds)) {
prediction <- ranger_preds[which(names(ranger_preds) == time)] > 0.5
real <- cost_test$time >= as.numeric(time)
confusion <- confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
sensitivity[as.character(i)] <- confusion$byclass[1]
}
但是由于此循环中使用了一些观察结果,因此出现以下错误:
But due to some of the observations used in this loop, I get the following error:
Error in confusionMatrix.default(as.factor(prediction), as.factor(real), :
The data must contain some levels that overlap the reference.
有人知道如何解决此问题吗?谢谢
Does anyone know how to fix this? Thanks
预测
和/或 real
中的某些值中只有1个唯一值.确保因素水平相同.
Certain values in prediction
and/or real
have only 1 unique value in them. Make sure the levels of the factors are the same.
sapply(names(ranger_preds), function(x) {
prediction <- factor(ranger_preds[[x]] > 0.5, levels = c(TRUE, FALSE))
real <- factor(cost_test$time >= as.numeric(x), levels = c(TRUE, FALSE))
confusion <- caret::confusionMatrix(prediction, real, positive = 'TRUE')
confusion$byClass[1]
}, USE.NAMES = FALSE) -> result
result