在 R 中选择具有值范围的列

问题描述:

我有一个数据集如下

date    A   B   C   D   E   F   G   H   I   J   K   L   M   N
2001    2   3   5   9   2   24  50  2   11  37  9   2   24  50
2002    3   14  14  5   2   21  28  3   14  14  2   3   2   8
2003    0   12  2   3   4   29  30  0   12  2   3   4   3   30
2004    1   3   3   2   2   1   4   1   3   3   2   2   1   4
2005    0   0   2   0   2   1   1   0   0   2   0   2   1   1
2006    0   0   0   0   0   0   0   0   0   0   0   0   0   0
2007    0   1   0   1   0   1   0   0   1   0   1   0   1   0
2008    0   0   1   1   0   0   0   0   0   1   1   0   0   0
2009    0   0   0   1   0   0   0   0   0   0   1   0   0   0
2010    0   0   0   0   0   1   0   0   0   0   0   0   1   0

从这个集合中我只想选择那些至少有一个大于 20 的值的列.我想要的集合如下

from this set I want to select only those columns which as at least one value greater than 20. My desired set is as follows

date    F   G   J   M   N
2001    24  50  37  24  50
2002    21  28  14  2   8
2003    29  30  2   3   30
2004    1   4   3   1   4
2005    1   1   2   1   1
2006    0   0   0   0   0
2007    1   0   0   1   0
2008    0   0   1   0   0
2009    0   0   0   0   0
2010    1   0   0   1   0

我尝试使用

mydf<-mydf[,apply(mydf,2,function(z) any(z>20))]

但我没有得到结果.我有包含 500 多列的数据集.

but I'm not getting the result. I have dataset containing more than 500 columns.

如何过滤具有特定值范围的列?

How can I filter columns with specific range of values?

如果您想保留非数字列,这可能是一个稍微安全一些的版本:

This may be a slightly safer version if you want to keep the non-numeric columns:

mydf[, sapply(mydf, function(col) !is.numeric(col) || any(col >= 20)), drop = FALSE]

如果你真的只想要数字:

And if you really want just the numerics:

mydf[, sapply(mydf, function(col) is.numeric(col) && any(col >= 20)), drop = FALSE]