在mutate语句中动态引用列名-dplyr
我为这个很长的问题道歉,但是过了一段时间我自己也找不到解决方法。
I stat apologize for the long question, but after quite a while I couldn't figure out a solution myself.
我有这个玩具数据框
set.seed(23)
df <- tibble::tibble(
id = paste0("00", 1:6),
cond = c(1, 1, 2, 2, 3, 3),
A_1 = sample(0:9, 6, replace = TRUE), A_2 = sample(0:9, 6, replace = TRUE), A_3 = sample(0:9, 6, replace = TRUE),
B_1 = sample(0:9, 6, replace = TRUE), B_2 = sample(0:9, 6, replace = TRUE), B_3 = sample(0:9, 6, replace = TRUE),
C_1 = sample(0:9, 6, replace = TRUE), C_2 = sample(0:9, 6, replace = TRUE), C_3 = sample(0:9, 6, replace = TRUE)
)
# A tibble: 6 x 11
# id cond A_1 A_2 A_3 B_1 B_2 B_3 C_1 C_2 C_3
# <chr> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 001 1 6 3 9 5 0 5 6 0 6
# 2 002 1 4 5 0 8 5 0 1 6 6
# 3 003 2 4 2 8 8 8 6 5 2 5
# 4 004 2 4 4 0 7 2 6 7 5 7
# 5 005 3 1 7 0 9 9 0 5 7 8
# 6 006 3 3 8 7 0 2 5 0 9 4
我想创建三个变量 A_def
, B_def
, C_def
仅采用相应变量< LETTER_NUMBER >之一的值;取决于其后缀等于变量 cond
的条件。
I would like to create three variables A_def
, B_def
, C_def
that take the values of only one of the corresponding variables <LETTER_NUMBER> depending on the condition that their suffix is equal to variable cond
.
例如,对于其中的行cond == 1
, A_def
的值应来自 A_1
, B_def
的值应为 B_1
, C_def
的值应为 C_1
。同样,如果 cond == 2
,则 * _ def
列应具有来自相应的值* _2
变量。
For instance, for the rows where cond == 1
, A_def
should have values from A_1
, B_def
should have values from B_1
, C_def
should have values from C_1
. Likewise, if cond == 2
, the *_def
columns should have values from the respective *_2
variables.
我设法通过两种方式实现所需的输出:一种是硬编码的(可以避免 cond
包含许多值),并且其中一个使用 tidyr
的枢轴函数。
I managed to achieve my desired output in two ways: one hard-coded (possibly to avoid if cond
contains many values) and one using tidyr
's pivoting functions.
硬编码解决方案:
df %>%
mutate(
A_def = ifelse(cond == 1, A_1, ifelse(cond == 2, A_2, A_3)),
B_def = ifelse(cond == 1, B_1, ifelse(cond == 2, B_2, B_3)),
C_def = ifelse(cond == 1, C_1, ifelse(cond == 2, C_2, C_3))
) %>%
select(id, cond, contains("_def"))
tidyr
的解决方案:
df %>%
pivot_longer(cols = contains("_")) %>%
mutate(
number = gsub("[A-Za-z_]", "", name),
name = gsub("[^A-Za-z]", "", name)
) %>%
filter(cond == number) %>%
pivot_wider(id_cols = c(id, cond), names_from = name, values_from = value, names_glue = "{name}_def")
两种情况下的输出
# A tibble: 6 x 5
# id cond A_def B_def C_def
# <chr> <dbl> <int> <int> <int>
# 1 001 1 6 5 6
# 2 002 1 4 8 1
# 3 003 2 2 8 2
# 4 004 2 4 2 5
# 5 005 3 0 0 8
# 6 006 3 7 5 4
现在,我想知道是否可以使用 mutate
和/或 cross
以动态方式获得相同的输出(也许在 mutate
内使用 ifelse
语句?)。我尝试了以下代码段,但结果与预期不符。在其中之一中,我尝试在 ifelse
语句中将变量名设置为符号,但出现错误。
Now, I was wondering whether it is possible to obtain the same output using mutate
and/or across
in a dynamic fashion (maybe using ifelse
statements inside mutate
?). I tried the following code snippets but the results were not as expected. In one of them I tried to make the variable names as symbols within ifelse
statements, but I got an error.
df %>%
mutate(across(paste0(c("A", "B", "C"), "_1"),
~ifelse(cond == 1, cur_column(),
ifelse(cond == 2, cur_column(), paste0(gsub("[^A-Za-z]", "", cur_column()), "_3"))))) %>%
select(id, cond, contains("_1"))
df %>%
mutate_at(paste0(c("A", "B", "C"), "_1"),
~ifelse(cond == 1, ., ifelse(cond == 2, ., paste0(., "_2")))) %>%
select(id, cond, contains("_1"))
df %>%
mutate_at(paste0(c("A", "B", "C"), "_1"),
~ifelse(cond == 1, !!!rlang::syms(paste0(c("A", "B", "C"), "_1")),
ifelse(cond == 2, !!!rlang::syms(paste0(c("A", "B", "C"), "_2")),
!!!rlang::syms(paste0(c("A", "B", "C"), "_3")))))
问题:是否有一种方法可以使用 dplyr
的语句,例如 mutate
(或其取代的范围变体)和/或跨
?
Question: is there a way to obtain the same desired output as above using dplyr
's statements such as mutate
(or its superseded scoped variants) and/or across
?
我同意 tidyr
使代码更具可读性的其他评论,但这是 pmap $ c的另一种方法$ c>:
I agree with the other comments that tidyr
makes for more readable code, but here's an alternative approach with pmap
:
library(purrr)
library(rlang)
pmap_dfr(df, ~with(list(...),
set_names(c(id, cond,
map_dbl(c("A","B","C"),
~ eval_tidy(parse_expr(paste(.x,cond,sep = "_"))))),
c("id","cond","A_def","B_def","C_def"))
))
# A tibble: 6 x 5
id cond A_def B_def C_def
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 6 5 6
2 2 1 4 8 1
3 3 2 2 8 2
4 4 2 4 2 5
5 5 3 0 0 8
6 6 3 7 5 4