如何使用“OR"组合多个条件以对数据帧进行子集化?
我在 R 中有一个 data.frame.我想在两个不同的列上尝试两个不同的条件,但我希望这些条件包含在内.因此,我想使用OR"来组合条件.当我想使用AND"条件时,我之前使用过以下语法并取得了很大成功.
I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.
my.data.frame <- data[(data$V1 > 2) & (data$V2 < 4), ]
但我不知道如何在上面使用或".
But I don't know how to use an 'OR' in the above.
my.data.frame <- subset(data , V1 > 2 | V2 < 4)
模拟此函数行为的替代解决方案,更适合包含在函数体内:
An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:
new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]
有些人批评使用 which
是不必要的,但它确实可以防止 NA
值返回不需要的结果.与上面演示的没有 which
的两个选项等效(即不为 V1 或 V2 中的任何 NA 返回 NA 行):
Some people criticize the use of which
as not needed, but it does prevent the NA
values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which
would be:
new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]
注意:我要感谢试图修复上面代码中的错误的匿名贡献者,这个修复被版主拒绝了.实际上,我在更正第一个错误时注意到了一个额外的错误.如果要按照我的意图进行处理,则首先需要检查 NA 值的条件子句,因为 ...
Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...
> NA & 1
[1] NA
> 0 & NA
[1] FALSE
使用&"时,参数的顺序可能很重要.
Order of arguments may matter when using '&".