如何将r数据帧的多个列合并为列表的单个列
我想将数据框中的多列合并为该数据框中的一列(即列表).例如,我具有以下数据框成分:
I would like to combine multiple columns that I have in a data frame into one column in that data frame that is a list. For example, I have the following data frame ingredients:
name1 name2 imgID attr1 attr2 attr3...
Item1 ItemID1 Img1 water chocolate soy...
Item2 ItemID2 Img2 cocoa spice milk...
我想将attr列合并为一列,这些列是这些项目的逗号分隔列表,并在可能的情况下将它们显示为以下格式:
I would like to combine the attr columns into one column that is a comma-separated list of those items and if possible have them appear in the following format:
name1 name2 imgID attrs
Item1 ItemID1 Img1 c("water", "chocolate", "soy", ...)
Item2 ItemID2 Img2 c("cocoa", "spice", "milk", ...)
是否有一种简洁的方法来使用粘贴或联接来编写代码,从而使我可以将数据框的列称为 ingredients [4:50]
而不是每个名称?还有没有办法在该列表中不包含 NA
或 NULL
值?
Is there a succinct way to write the code using a paste or join that allows me to call the columns of the data frame as ingredients[4:50]
rather than each one by name? Is there also a way to not include NA
or NULL
values in that list?
您可以使用 tidyr :: nest
,尽管此后您可能希望将嵌套的数据帧简化为字符向量,例如
You could use tidyr::nest
, though you'll probably want to simplify the nested data frames to character vectors afterwards, e.g.
library(tidyverse)
items <- tibble(name1 = c("Item1", "Item2"),
name2 = c("ItemID1", "ItemID2"),
imgID = c("Img1", "Img2"),
attr1 = c("water", "cocoa"),
attr2 = c("chocolate", "spice"),
attr3 = c("soy", "milk"))
items_nested <- items %>%
nest(contains('attr'), .key = 'attr') %>%
mutate(attr = map(attr, simplify))
items_nested
#> # A tibble: 2 x 4
#> name1 name2 imgID attr
#> <chr> <chr> <chr> <list>
#> 1 Item1 ItemID1 Img1 <chr [3]>
#> 2 Item2 ItemID2 Img2 <chr [3]>
其他选项包括使用 tidyr :: gather
重塑形状,按除新列之外的所有列进行分组,以及将value列以更加以dplyr为重点的样式聚合到列表中:
Other options include reshaping to long with tidyr::gather
, grouping by all but the new columns, and aggregating the value column into a list in a more dplyr-focused style:
items %>%
gather(attr_num, attr, contains('attr')) %>%
group_by_at(vars(-attr_num, -attr)) %>%
summarise(attr = list(attr)) %>%
ungroup()
或 unite
组合 attr *
列,然后使用 strsplit
以更注重字符串的样式将它们分隔在列表列中:/p>
or unite
ing the attr*
columns and then separating them within a list column with strsplit
in a more string-focused style:
items %>%
unite(attr, contains('attr')) %>%
mutate(attr = strsplit(attr, '_'))
或使用 purrr :: transpose
和tidyselect以列表为中心的样式:
or using purrr::transpose
and tidyselect in a list-focused style:
items %>%
mutate(attr = transpose(select(., contains('attr')))) %>%
select(-matches('attr.'))
所有选项至少在示例数据上都返回相同的内容.进一步清理,例如删除 NA
可以通过使用 lapply
/ purrr :: map
遍历新列来完成.
All options return the same thing, at least on the sample data. Further cleanup, e.g. dropping NA
s, can be done by iterating over the new column with lapply
/purrr::map
.