根据特定列中的值重复数据框中的行

问题描述:

我想根据 samples 列在数据框中重复整行.

I would like to repeat entire rows in a data-frame based on the samples column.

我的输入:

df <- 'chr start end samples
        1   10   20    2
        2   4    10    3'
df <- read.table(text=df, header=TRUE)

我的预期输出:

df <- 'chr start end  samples
        1   10   20   1-10-20-s1
        1   10   20   1-10-20-s2
        2   4    10   2-4-10-s1
        2   4    10   2-4-10-s2
        2   4    10   2-4-10-s3'

知道如何明智地执行它吗?

Some idea how to perform it wisely?

我们可以使用 expandRows 根据 'samples' 列中的值展开行,然后转换为 data.table,按 'chr' 分组,我们使用 sprintf 将列与行序列粘贴在一起以更新 'samples' 列.

We can use expandRows to expand the rows based on the value in the 'samples' column, then convert to data.table, grouped by 'chr', we paste the columns together along with sequence of rows using sprintf to update the 'samples' column.

library(splitstackshape)
setDT(expandRows(df, "samples"))[,
     samples := sprintf("%d-%d-%d-%s%d", chr, start, end, "s",1:.N) , chr][]
#  chr start end    samples
#1:   1    10  20 1-10-20-s1
#2:   1    10  20 1-10-20-s2
#3:   2     4  10  2-4-10-s1
#4:   2     4  10  2-4-10-s2
#5:   2     4  10  2-4-10-s3

注意:data.table 将在我们加载 splitstackshape 时加载.

NOTE: data.table will be loaded when we load splitstackshape.