在 R 中选择具有值范围的列
问题描述:
我有一个数据集如下
date A B C D E F G H I J K L M N
2001 2 3 5 9 2 24 50 2 11 37 9 2 24 50
2002 3 14 14 5 2 21 28 3 14 14 2 3 2 8
2003 0 12 2 3 4 29 30 0 12 2 3 4 3 30
2004 1 3 3 2 2 1 4 1 3 3 2 2 1 4
2005 0 0 2 0 2 1 1 0 0 2 0 2 1 1
2006 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2007 0 1 0 1 0 1 0 0 1 0 1 0 1 0
2008 0 0 1 1 0 0 0 0 0 1 1 0 0 0
2009 0 0 0 1 0 0 0 0 0 0 1 0 0 0
2010 0 0 0 0 0 1 0 0 0 0 0 0 1 0
从这个集合中我只想选择那些至少有一个大于 20 的值的列.我想要的集合如下
from this set I want to select only those columns which as at least one value greater than 20. My desired set is as follows
date F G J M N
2001 24 50 37 24 50
2002 21 28 14 2 8
2003 29 30 2 3 30
2004 1 4 3 1 4
2005 1 1 2 1 1
2006 0 0 0 0 0
2007 1 0 0 1 0
2008 0 0 1 0 0
2009 0 0 0 0 0
2010 1 0 0 1 0
我尝试使用
mydf<-mydf[,apply(mydf,2,function(z) any(z>20))]
但我没有得到结果.我有包含 500 多列的数据集.
but I'm not getting the result. I have dataset containing more than 500 columns.
如何过滤具有特定值范围的列?
How can I filter columns with specific range of values?
答
如果您想保留非数字列,这可能是一个稍微安全一些的版本:
This may be a slightly safer version if you want to keep the non-numeric columns:
mydf[, sapply(mydf, function(col) !is.numeric(col) || any(col >= 20)), drop = FALSE]
如果你真的只想要数字:
And if you really want just the numerics:
mydf[, sapply(mydf, function(col) is.numeric(col) && any(col >= 20)), drop = FALSE]