如何将r数据帧的多个列合并为列表的单个列

问题描述：

我想将数据框中的多列合并为该数据框中的一列(即列表).例如，我具有以下数据框成分:

I would like to combine multiple columns that I have in a data frame into one column in that data frame that is a list. For example, I have the following data frame ingredients:

name1 name2 imgID attr1 attr2 attr3...
Item1 ItemID1 Img1 water chocolate soy...
Item2 ItemID2 Img2 cocoa spice milk...

我想将attr列合并为一列，这些列是这些项目的逗号分隔列表，并在可能的情况下将它们显示为以下格式:

I would like to combine the attr columns into one column that is a comma-separated list of those items and if possible have them appear in the following format:

name1 name2 imgID attrs
Item1 ItemID1 Img1 c("water", "chocolate", "soy", ...)
Item2 ItemID2 Img2 c("cocoa", "spice", "milk", ...)

是否有一种简洁的方法来使用粘贴或联接来编写代码，从而使我可以将数据框的列称为 ingredients [4:50] 而不是每个名称?还有没有办法在该列表中不包含 NA 或 NULL 值?

Is there a succinct way to write the code using a paste or join that allows me to call the columns of the data frame as ingredients[4:50] rather than each one by name? Is there also a way to not include NA or NULL values in that list?

答

您可以使用 tidyr :: nest ，尽管此后您可能希望将嵌套的数据帧简化为字符向量，例如

You could use tidyr::nest, though you'll probably want to simplify the nested data frames to character vectors afterwards, e.g.

library(tidyverse)

items <- tibble(name1 = c("Item1", "Item2"), 
                name2 = c("ItemID1", "ItemID2"), 
                imgID = c("Img1", "Img2"), 
                attr1 = c("water", "cocoa"), 
                attr2 = c("chocolate", "spice"), 
                attr3 = c("soy", "milk"))

items_nested <- items %>% 
    nest(contains('attr'), .key = 'attr') %>% 
    mutate(attr = map(attr, simplify))

items_nested
#> # A tibble: 2 x 4
#>   name1 name2   imgID attr     
#>   <chr> <chr>   <chr> <list>   
#> 1 Item1 ItemID1 Img1  <chr [3]>
#> 2 Item2 ItemID2 Img2  <chr [3]>

其他选项包括使用 tidyr :: gather 重塑形状，按除新列之外的所有列进行分组，以及将value列以更加以dplyr为重点的样式聚合到列表中:

Other options include reshaping to long with tidyr::gather, grouping by all but the new columns, and aggregating the value column into a list in a more dplyr-focused style:

items %>% 
    gather(attr_num, attr, contains('attr')) %>% 
    group_by_at(vars(-attr_num, -attr)) %>% 
    summarise(attr = list(attr)) %>% 
    ungroup()

或 unite 组合 attr * 列，然后使用 strsplit 以更注重字符串的样式将它们分隔在列表列中:/p>

or uniteing the attr* columns and then separating them within a list column with strsplit in a more string-focused style:

items %>% 
    unite(attr, contains('attr')) %>% 
    mutate(attr = strsplit(attr, '_'))

或使用 purrr :: transpose 和tidyselect以列表为中心的样式:

or using purrr::transpose and tidyselect in a list-focused style:

items %>% 
    mutate(attr = transpose(select(., contains('attr')))) %>% 
    select(-matches('attr.'))

所有选项至少在示例数据上都返回相同的内容.进一步清理，例如删除 NA 可以通过使用 lapply / purrr :: map 遍历新列来完成.

All options return the same thing, at least on the sample data. Further cleanup, e.g. dropping NAs, can be done by iterating over the new column with lapply/purrr::map.

如何将r数据帧的多个列合并为列表的单个列

相关推荐