方括号多列 R

问题描述:

我很困惑.我正在尝试根据两列中的值来隔离 df 的某些行.和往常一样,我首先在实践数据中尝试这个.我的代码工作正常.

I am flummoxed. I am trying to isolate certain rows of df according to values in two columns. As always i try this in practice data first. My code works fine.

data1<-df2[df2$fruit=="kiwi" |  df2$fruit=="orange" | df2$fruit=="apple"  & (df2$dates>= "2010-04-01" & df2$dates<  "2010-10-01"), ]

当我在真实数据上尝试相同的代码时,它不起作用.它收集我需要的水果",但忽略我的日期范围请求.

when I try the same code on my real data, it doesn't work. It collects the "fruits" I need, but ignores my date range request.

 data1<-lti_first[lti_first$hai_atc=="C10AA01" | lti_first$hai_atc=="C10AA03" | lti_first$hai_atc=="C10AA04" | lti_first$hai_atc=="C10AA05" | lti_first$hai_atc=="C10AA07" | lti_first$hai_atc=="C10AB02" |lti_first$hai_atc=="C10AA04" |lti_first$hai_atc=="C10AB08" | lti_first$hai_atc=="C10AX09" & (lti_first$date_of_claim >= "2010-04-01" & lti_first$date_of_claim<"2010-10-01"), ]

我的练习数据和真实数据中的变量结构完全相同.Fruits/hai_atc 是两个 dfs 中的因子,dates 是两个 dfs 中的 as.Dates.

the structure of the variables in my practice data and real data are the exact same. Fruits/hai_atc are factors in both dfs, dates are as.Dates in both dfs.

为了解决这个问题,我尝试对我的数据进行子集化,但这对我也不起作用(但对练习数据有效)

in an effort to get around this I've tried subsetting my data instead, but that won't work for me either (but does work on practice data)

x<-subset(lti_first, hai_atc=="V07AY03" | hai_atc=="A11JC94" & (date_of_claim>="2010-04-01" & date_of_claim<"2010-10-01"))

我做错了什么?对我来说,我的代码看起来完全一样!

What am I doing wrong? To me, my code looks identical!

样本 df

names<-c("tom", "mary", "tom", "john", "mary",
 "tom", "john", "mary", "john", "mary", "tom", "mary", "john", "john")
dates<-as.Date(c("2010-02-01", "2010-05-01", "2010-03-01", 
"2010-07-01", "2010-07-01", "2010-06-01", "2010-09-01",
 "2010-07-01", "2010-11-01", "2010-09-01", "2010-08-01", 
"2010-11-01", "2010-12-01", "2011-01-01"))
fruit<-as.character(c("apple", "orange", "banana", "kiwi",
 "apple", "apple", "apple", "orange", "banana", "apple",
 "kiwi", "apple", "orange", "apple"))
age<-as.numeric(c(60,55,60,57,55,60,57,55,57,55,60,55, 57,57))
sex<-as.character(c("m","f","m","m","f","m","m",
 "f","m","f","m","f","m", "m"))
df2<-data.frame(names,dates, age, sex, fruit)
df2


dput(df2)
structure(list(names = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 1L, 2L, 3L, 2L, 1L, 1L), .Label = c("john", "mary", "tom"
), class = "factor"), dates = structure(c(14641, 14730, 14669, 
14791, 14791, 14761, 14853, 14791, 14914, 14853, 14822, 14914, 
14944, 14975), class = "Date"), age = c(60, 55, 60, 57, 55, 60, 
57, 55, 57, 55, 60, 55, 57, 57), sex = structure(c(2L, 1L, 2L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("f", 
"m"), class = "factor"), fruit = structure(c(1L, 4L, 2L, 3L, 
1L, 1L, 1L, 4L, 2L, 1L, 3L, 1L, 4L, 1L), .Label = c("apple", 
"banana", "kiwi", "orange"), class = "factor")), .Names = c("names", 
"dates", "age", "sex", "fruit"), row.names = c(NA, -14L), class = "data.frame")

**真实数据太大无法放入 dput,这里是一个 str

**real data too big to put in dput, here's an str instead

str(sample_lti_first)
'data.frame':   20 obs. of  5 variables:
 $ hai_dispense_number: Factor w/ 53485 levels "Patient HAI0000017",..: 22260 22260 2527 24311 24311 24311 24311 13674 13674 13674 ...
 $ sex                : Factor w/ 4 levels "F","M","U","X": 2 2 2 1 1 1 1 1 1 1 ...
 $ hai_age            : int  18 18 27 40 40 40 40 28 28 28 ...
 $ date_of_claim      : Date, format: "2009-10-09" "2009-10-09" "2009-10-18" ...
 $ hai_atc            : Factor w/ 1038 levels "","A01AA01","A01AB03",..: 144 76 859 80 1009 1009 859 81 1008 859 ...

这行得通吗?

data1 <- subset(lti_first,
  (hai_atc %in% c("C10AA01", "C10AA03", "C10AA04", "C10AA05", "C10AA07",
                  "C10AB02", "C10AA04", "C10AB08", "C10AX09")) & 
  (date_of_claim >= as.Date("2010-04-01") & date_of_claim < as.Date("2010-10-01")))

注意 %in%as.Date 的使用.

Note the use of %in% and as.Date.