有条件地用R计算列中的值数

问题描述:

我有两个向量:

x <- c(1,1,1,1,1, 2,2,2,3,3,  3,3,3,4,4,  5,5,5,5,5 )
y <- c(2,2,1,3,2, 1,4,2,2,NA, 3,3,3,4,NA, 1,4,4,2,NA)

此问题(有条件的用R,part2 计算列中的值数)讨论了如何找到 w 中的值数(不计算 NA ),每个 x (从1-5开始)和每个 y (从1-4)。

This question (Conditional calculating the numbers of values in column with R, part2) discussed how to find the number of values in w (don't count NA) for each x (from 1–5) and for each y (from 1–4).

让我们按组划分 X :如果 x< = 2 ,组 I ;如果 2< x< = 3 ,则将 II 分组;并且如果 3< X< = 5 ,则将 III 分组。我需要在 x 中按组以及每个 y 的值找到不同值的数量。我还需要在同一组的 x 中找到这些值的平均值。输出应采用以下格式:

Let's split X by groups: if x<=2, group I; if 2<x<=3, group II; and if 3<X<=5, group III. I need to find the number of different values in x by groups and by every value of y. I also need to find the mean of those values in x by the same groups. The output should be in this format:

y x    Result 1 (the number of distinct numbers in X); Result 2 (the mean)
1 I     ...
1 II    ...
1 III   ...     
...
4 I     ...
4 II    ...
4 III   ...


#Bring in data.table library
require(data.table)
data <- data.table(x,y)

#Summarize data
data[, list(x = mean(x, na.rm=TRUE)), by = 
       list(y, x.grp = cut(x, c(-Inf,2,3,5,Inf)))][order(y,x.grp)]






如果您希望当 NA $时结果为 NA c $ c>存在,然后从 mean(。)中删除​​ na.rm = TRUE : p>


If you'd like the results to be NA when NAs are present, then just remove na.rm=TRUE from mean(.):

data[, list(x = mean(x)), by = 
       list(y, x.grp = cut(x, c(-Inf,2,3,5,Inf)))][order(y,x.grp)]