带有pivot_longer 的整洁数据集:将多列分成两列

问题描述:

我大家,

我目前正在学习 R 并尝试使用 tidyverse 包中的 pivot_longer() 整理数据集.

I am currently learning R and trying to tidy up a dataset using pivot_longer() from the tidyverse package.

我有这个小玩意

title               actor_1    actor_2    actor_3     actor_1_FB_likes actor_2_FB_likes actor_3_FB_likes
   <chr>               <chr>      <chr>      <chr>                  <dbl>            <dbl>            <dbl>
 1 Avatar              CCH Pound… Joel Davi… Wes Studi               1000              936              855
 2 Pirates of the Car… Johnny De… Orlando B… Jack Daven…            40000             5000             1000
 3 The Dark Knight Ri… Tom Hardy  Christian… Joseph Gor…            27000            23000            23000
 4 John Carter         Daryl Sab… Samantha … Polly Walk…              640              632              530
 5 Spider-Man 3        J.K. Simm… James Fra… Kirsten Du…            24000            11000             4000
 6 Tangled             Brad Garr… Donna Mur… M.C. Gainey              799              553              284

我想把它改成如下格式:

I want to change it into the following format:

      title          actor_name    num_likes   
       <chr>            <chr>      <dbl>
     1 Avatar        CCH Pounder   1000
     2 Avatar        Joel David Moore    936
     2 Avatar        Wes Studi      855    

等等......不幸的是我被卡住了.无论我尝试什么,我最终都会得到这样的格式:

and so on... unfortunately I am stuck. No matter what I try I somehow end up with something like this format:

title          actor_num actor_name       actor_likes      num_likes
   <chr>       <chr>     <chr>            <chr>                <dbl>
 1 Avatar      actor_1   CCH Pounder      actor_1_FB_likes      1000
 2 Avatar      actor_1   CCH Pounder      actor_2_FB_likes       936
 3 Avatar      actor_1   CCH Pounder      actor_3_FB_likes       855
 4 Avatar      actor_2   Joel David Moore actor_1_FB_likes      1000
 5 Avatar      actor_2   Joel David Moore actor_2_FB_likes       936
 6 Avatar      actor_2   Joel David Moore actor_3_FB_likes       855
 7 Avatar      actor_3   Wes Studi        actor_1_FB_likes      1000
 8 Avatar      actor_3   Wes Studi        actor_2_FB_likes       936
 9 Avatar      actor_3   Wes Studi        actor_3_FB_likes       855

我的最后一次尝试包括以下步骤:

My last attempt consisted of the following steps:

exercise8 <- exercise8 %>% pivot_longer(cols= actor_1:actor_3, names_to='actor_num', values_to='actor_name')
exercise8 <- exercise8 %>% pivot_longer(cols= actor_1_FB_likes:actor_3_FB_likes, names_to='actor_likes', values_to='num_likes')

我当然可以删除 actor_num 和 actor_likes 列,但这仍然不会产生所需的格式.

I can of course delete the columns actor_num and actor_likes but that would still not result in the desired format.

有人可以帮忙吗?我是完全错误的开始还是缺少最后一步?提前致谢!

Can anyone help? Am I starting completely wrong or is there a last step I am missing? Thank you in advance!

如果 name 和 likes 列的列名末尾的数字值一致,可能会更容易.

It might be easier if you have the number values at the end of the column names consistently for both names and likes columns.

添加一行以重命名列名称,并在结尾处始终使用_1"、_2"等.

Add a line to rename your column names, and consistently use "_1", "_2", etc. at ending.

然后 pivot_longer 使用正则表达式模式在末尾假设数字.

Then pivot_longer with regex pattern to assume number at end.

library(tidyverse)

names(df) <- sub("(\\d+)_(\\w*)", "\\2_\\1", names(df))

df %>%
  pivot_longer(starts_with("actor"), 
               names_to = c(".value", "group"),
               names_pattern = "(\\w+)_(\\d+)$")

输出

# A tibble: 18 x 4
   title               group actor       actor_FB_likes
   <chr>               <chr> <chr>                <int>
 1 Avatar              1     CCH_Pound…            1000
 2 Avatar              2     Joel_Davi…             936
 3 Avatar              3     Wes_Studi              855
 4 Pirates_of_the_Car… 1     Johnny_De…           40000
 5 Pirates_of_the_Car… 2     Orlando_B…            5000
 6 Pirates_of_the_Car… 3     Jack_Daven…           1000
 7 The_Dark_Knight_Ri… 1     Tom_Hardy            27000
 8 The_Dark_Knight_Ri… 2     Christian…           23000
 9 The_Dark_Knight_Ri… 3     Joseph_Gor…          23000
10 John_Carter         1     Daryl_Sab…             640
11 John_Carter         2     Samantha_…             632
12 John_Carter         3     Polly_Walk…            530
13 Spider-Man_3        1     J.K._Simm…           24000
14 Spider-Man_3        2     James_Fra…           11000
15 Spider-Man_3        3     Kirsten_Du…           4000
16 Tangled             1     Brad_Garr…             799
17 Tangled             2     Donna_Mur…             553
18 Tangled             3     M.C._Gainey            284

数据

df <- structure(list(title = c("Avatar", "Pirates_of_the_Car…", "The_Dark_Knight_Ri…", 
"John_Carter", "Spider-Man_3", "Tangled"), actor_1 = c("CCH_Pound…", 
"Johnny_De…", "Tom_Hardy", "Daryl_Sab…", "J.K._Simm…", 
"Brad_Garr…"), actor_2 = c("Joel_Davi…", "Orlando_B…", 
"Christian…", "Samantha_…", "James_Fra…", "Donna_Mur…"
), actor_3 = c("Wes_Studi", "Jack_Daven…", "Joseph_Gor…", 
"Polly_Walk…", "Kirsten_Du…", "M.C._Gainey"), actor_1_FB_likes = c(1000L, 
40000L, 27000L, 640L, 24000L, 799L), actor_2_FB_likes = c(936L, 
5000L, 23000L, 632L, 11000L, 553L), actor_3_FB_likes = c(855L, 
1000L, 23000L, 530L, 4000L, 284L)), class = "data.frame", row.names = c(NA, 
-6L))