带有pivot_longer 的整洁数据集:将多列分成两列
我大家,
我目前正在学习 R 并尝试使用 tidyverse 包中的 pivot_longer() 整理数据集.
I am currently learning R and trying to tidy up a dataset using pivot_longer() from the tidyverse package.
我有这个小玩意
title actor_1 actor_2 actor_3 actor_1_FB_likes actor_2_FB_likes actor_3_FB_likes
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Avatar CCH Pound… Joel Davi… Wes Studi 1000 936 855
2 Pirates of the Car… Johnny De… Orlando B… Jack Daven… 40000 5000 1000
3 The Dark Knight Ri… Tom Hardy Christian… Joseph Gor… 27000 23000 23000
4 John Carter Daryl Sab… Samantha … Polly Walk… 640 632 530
5 Spider-Man 3 J.K. Simm… James Fra… Kirsten Du… 24000 11000 4000
6 Tangled Brad Garr… Donna Mur… M.C. Gainey 799 553 284
我想把它改成如下格式:
I want to change it into the following format:
title actor_name num_likes
<chr> <chr> <dbl>
1 Avatar CCH Pounder 1000
2 Avatar Joel David Moore 936
2 Avatar Wes Studi 855
等等......不幸的是我被卡住了.无论我尝试什么,我最终都会得到这样的格式:
and so on... unfortunately I am stuck. No matter what I try I somehow end up with something like this format:
title actor_num actor_name actor_likes num_likes
<chr> <chr> <chr> <chr> <dbl>
1 Avatar actor_1 CCH Pounder actor_1_FB_likes 1000
2 Avatar actor_1 CCH Pounder actor_2_FB_likes 936
3 Avatar actor_1 CCH Pounder actor_3_FB_likes 855
4 Avatar actor_2 Joel David Moore actor_1_FB_likes 1000
5 Avatar actor_2 Joel David Moore actor_2_FB_likes 936
6 Avatar actor_2 Joel David Moore actor_3_FB_likes 855
7 Avatar actor_3 Wes Studi actor_1_FB_likes 1000
8 Avatar actor_3 Wes Studi actor_2_FB_likes 936
9 Avatar actor_3 Wes Studi actor_3_FB_likes 855
我的最后一次尝试包括以下步骤:
My last attempt consisted of the following steps:
exercise8 <- exercise8 %>% pivot_longer(cols= actor_1:actor_3, names_to='actor_num', values_to='actor_name')
exercise8 <- exercise8 %>% pivot_longer(cols= actor_1_FB_likes:actor_3_FB_likes, names_to='actor_likes', values_to='num_likes')
我当然可以删除 actor_num 和 actor_likes 列,但这仍然不会产生所需的格式.
I can of course delete the columns actor_num and actor_likes but that would still not result in the desired format.
有人可以帮忙吗?我是完全错误的开始还是缺少最后一步?提前致谢!
Can anyone help? Am I starting completely wrong or is there a last step I am missing? Thank you in advance!
如果 name 和 likes 列的列名末尾的数字值一致,可能会更容易.
It might be easier if you have the number values at the end of the column names consistently for both names and likes columns.
添加一行以重命名列名称,并在结尾处始终使用_1"、_2"等.
Add a line to rename your column names, and consistently use "_1", "_2", etc. at ending.
然后 pivot_longer
使用正则表达式模式在末尾假设数字.
Then pivot_longer
with regex pattern to assume number at end.
library(tidyverse)
names(df) <- sub("(\\d+)_(\\w*)", "\\2_\\1", names(df))
df %>%
pivot_longer(starts_with("actor"),
names_to = c(".value", "group"),
names_pattern = "(\\w+)_(\\d+)$")
输出
# A tibble: 18 x 4
title group actor actor_FB_likes
<chr> <chr> <chr> <int>
1 Avatar 1 CCH_Pound… 1000
2 Avatar 2 Joel_Davi… 936
3 Avatar 3 Wes_Studi 855
4 Pirates_of_the_Car… 1 Johnny_De… 40000
5 Pirates_of_the_Car… 2 Orlando_B… 5000
6 Pirates_of_the_Car… 3 Jack_Daven… 1000
7 The_Dark_Knight_Ri… 1 Tom_Hardy 27000
8 The_Dark_Knight_Ri… 2 Christian… 23000
9 The_Dark_Knight_Ri… 3 Joseph_Gor… 23000
10 John_Carter 1 Daryl_Sab… 640
11 John_Carter 2 Samantha_… 632
12 John_Carter 3 Polly_Walk… 530
13 Spider-Man_3 1 J.K._Simm… 24000
14 Spider-Man_3 2 James_Fra… 11000
15 Spider-Man_3 3 Kirsten_Du… 4000
16 Tangled 1 Brad_Garr… 799
17 Tangled 2 Donna_Mur… 553
18 Tangled 3 M.C._Gainey 284
数据
df <- structure(list(title = c("Avatar", "Pirates_of_the_Car…", "The_Dark_Knight_Ri…",
"John_Carter", "Spider-Man_3", "Tangled"), actor_1 = c("CCH_Pound…",
"Johnny_De…", "Tom_Hardy", "Daryl_Sab…", "J.K._Simm…",
"Brad_Garr…"), actor_2 = c("Joel_Davi…", "Orlando_B…",
"Christian…", "Samantha_…", "James_Fra…", "Donna_Mur…"
), actor_3 = c("Wes_Studi", "Jack_Daven…", "Joseph_Gor…",
"Polly_Walk…", "Kirsten_Du…", "M.C._Gainey"), actor_1_FB_likes = c(1000L,
40000L, 27000L, 640L, 24000L, 799L), actor_2_FB_likes = c(936L,
5000L, 23000L, 632L, 11000L, 553L), actor_3_FB_likes = c(855L,
1000L, 23000L, 530L, 4000L, 284L)), class = "data.frame", row.names = c(NA,
-6L))