在mutate语句中动态引用列名-dplyr

问题描述:

我为这个很长的问题道歉,但是过了一段时间我自己也找不到解决方法。

I stat apologize for the long question, but after quite a while I couldn't figure out a solution myself.

我有这个玩具数据框

set.seed(23)
df <- tibble::tibble(
  id = paste0("00", 1:6),
  cond = c(1, 1, 2, 2, 3, 3),
  A_1 = sample(0:9, 6, replace = TRUE), A_2 = sample(0:9, 6, replace = TRUE), A_3 = sample(0:9, 6, replace = TRUE),
  B_1 = sample(0:9, 6, replace = TRUE), B_2 = sample(0:9, 6, replace = TRUE), B_3 = sample(0:9, 6, replace = TRUE),
  C_1 = sample(0:9, 6, replace = TRUE), C_2 = sample(0:9, 6, replace = TRUE), C_3 = sample(0:9, 6, replace = TRUE)
)

# A tibble: 6 x 11
#   id     cond   A_1   A_2   A_3   B_1   B_2   B_3   C_1   C_2   C_3
#   <chr> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 001       1     6     3     9     5     0     5     6     0     6
# 2 002       1     4     5     0     8     5     0     1     6     6
# 3 003       2     4     2     8     8     8     6     5     2     5
# 4 004       2     4     4     0     7     2     6     7     5     7
# 5 005       3     1     7     0     9     9     0     5     7     8
# 6 006       3     3     8     7     0     2     5     0     9     4

我想创建三个变量 A_def B_def C_def 仅采用相应变量< LETTER_NUMBER &gt之一的值;取决于其后缀等于变量 cond 的条件。

I would like to create three variables A_def, B_def, C_def that take the values of only one of the corresponding variables <LETTER_NUMBER> depending on the condition that their suffix is equal to variable cond.

例如,对于其中的行cond == 1 A_def 的值应来自 A_1 B_def 的值应为 B_1 C_def 的值应为 C_1 。同样,如果 cond == 2 ,则 * _ def 列应具有来自相应的值* _2 变量。

For instance, for the rows where cond == 1, A_def should have values from A_1, B_def should have values from B_1, C_def should have values from C_1. Likewise, if cond == 2, the *_def columns should have values from the respective *_2 variables.

我设法通过两种方式实现所需的输出:一种是硬编码的(可以避免 cond 包含许多值),并且其中一个使用 tidyr 的枢轴函数。

I managed to achieve my desired output in two ways: one hard-coded (possibly to avoid if cond contains many values) and one using tidyr's pivoting functions.

硬编码解决方案:

df %>% 
  mutate(
    A_def = ifelse(cond == 1, A_1, ifelse(cond == 2, A_2, A_3)),
    B_def = ifelse(cond == 1, B_1, ifelse(cond == 2, B_2, B_3)),
    C_def = ifelse(cond == 1, C_1, ifelse(cond == 2, C_2, C_3))
  ) %>% 
  select(id, cond, contains("_def"))

tidyr 的解决方案:

df %>% 
  pivot_longer(cols = contains("_")) %>% 
  mutate(
    number = gsub("[A-Za-z_]", "", name),
    name = gsub("[^A-Za-z]", "", name)
  ) %>% 
  filter(cond == number) %>% 
  pivot_wider(id_cols = c(id, cond), names_from = name, values_from = value, names_glue = "{name}_def")

两种情况下的输出

# A tibble: 6 x 5
#   id     cond A_def B_def C_def
#   <chr> <dbl> <int> <int> <int>
# 1 001       1     6     5     6
# 2 002       1     4     8     1
# 3 003       2     2     8     2
# 4 004       2     4     2     5
# 5 005       3     0     0     8
# 6 006       3     7     5     4




现在,我想知道是否可以使用 mutate 和/或 cross 以动态方式获得相同的输出(也许在 mutate 内使用 ifelse 语句?)。我尝试了以下代码段,但结果与预期不符。在其中之一中,我尝试在 ifelse 语句中将变量名设置为符号,但出现错误。


Now, I was wondering whether it is possible to obtain the same output using mutate and/or across in a dynamic fashion (maybe using ifelse statements inside mutate?). I tried the following code snippets but the results were not as expected. In one of them I tried to make the variable names as symbols within ifelse statements, but I got an error.

df %>% 
  mutate(across(paste0(c("A", "B", "C"), "_1"),
                ~ifelse(cond == 1, cur_column(), 
                        ifelse(cond == 2, cur_column(), paste0(gsub("[^A-Za-z]", "", cur_column()), "_3"))))) %>% 
  select(id, cond, contains("_1"))

df %>% 
  mutate_at(paste0(c("A", "B", "C"), "_1"),
            ~ifelse(cond == 1, ., ifelse(cond == 2, ., paste0(., "_2")))) %>% 
  select(id, cond, contains("_1"))

df %>% 
  mutate_at(paste0(c("A", "B", "C"), "_1"),
            ~ifelse(cond == 1, !!!rlang::syms(paste0(c("A", "B", "C"), "_1")),
                    ifelse(cond == 2, !!!rlang::syms(paste0(c("A", "B", "C"), "_2")),
                           !!!rlang::syms(paste0(c("A", "B", "C"), "_3")))))

问题:是否有一种方法可以使用 dplyr 的语句,例如 mutate (或其取代的范围变体)和/或

Question: is there a way to obtain the same desired output as above using dplyr's statements such as mutate (or its superseded scoped variants) and/or across?

我同意 tidyr 使代码更具可读性的其他评论,但这是 pmap $ c的另一种方法$ c>:

I agree with the other comments that tidyr makes for more readable code, but here's an alternative approach with pmap:

library(purrr)
library(rlang)
pmap_dfr(df, ~with(list(...), 
               set_names(c(id, cond, 
                           map_dbl(c("A","B","C"),
                                 ~ eval_tidy(parse_expr(paste(.x,cond,sep = "_"))))),
                          c("id","cond","A_def","B_def","C_def"))
               ))
# A tibble: 6 x 5
     id  cond A_def B_def C_def
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     6     5     6
2     2     1     4     8     1
3     3     2     2     8     2
4     4     2     4     2     5
5     5     3     0     0     8
6     6     3     7     5     4