dplyr :: group_by_,带有几个变量名的字符串输入

问题描述:

我正在编写一个函数,要求用户在函数调用中定义一个或多个分组变量。然后使用dplyr对数据进行分组,如果只有一个分组变量,它的工作原理如下,但是我还没有想到如何使用多个分组变量。

I'm writing a function where the user is asked to define one or more grouping variables in the function call. The data is then grouped using dplyr and it works as expected if there is only one grouping variable, but I haven't figured out how to do it with multiple grouping variables.

示例:

x <- c("cyl")
y <- c("cyl", "gear")
dots <- list(~cyl, ~gear)

library(dplyr)
library(lazyeval) 

mtcars %>% group_by_(x)             # groups by cyl
mtcars %>% group_by_(y)             # groups only by cyl (not gear)
mtcars %>% group_by_(.dots = dots)  # groups by cyl and gear, this is what I want.

我试图将 y 作为使用:

mtcars %>% group_by_(.dots = interp(~var, var = list(y)))
#Error: is.call(expr) || is.name(expr) || is.atomic(expr) is not TRUE

如何使用用户定义的输入字符串> 1变量名称(例如 y 在示例中)使用dplyr对数据进行分组?

How to use a user-defined input string of > 1 variable names (like y in the example) to group the data using dplyr?

(这个问题在某种程度上相关这一个,但是没有回答。)

(This question is somehow related to this one but not answered there.)

不需要 interp 使用 as.formula 将字符串转换为公式:

No need for interp here, just use as.formula to convert the strings to formulas:

dots = sapply(y, . %>% {as.formula(paste0('~', .))})
mtcars %>% group_by_(.dots = dots)

您的 interp 方法不起作用的原因是表达式让您回来以下:

The reason why your interp approach doesn’t work is that the expression gives you back the following:

~list(c("cyl", "gear"))

- 不是你想要的当然,您可以在 y 之前, sapply interp 这将类似于使用 as.formula 以上:

– not what you want. You could, of course, sapply interp over y, which would be similar to using as.formula above:

dots1 = sapply(y, . %>% {interp(~var, var = .)})

但实际上,您也可以直接通过 y

But, in fact, you can also directly pass y:

mtcars %>% group_by_(.dots = y)

dplyr小插图更详细并解释这些方法的区别。

The dplyr vignette on non-standard evaluation goes into more detail and explains the difference between these approaches.