根据特定列中的值重复数据框中的行
问题描述:
我想根据 samples
列在数据框中重复整行.
I would like to repeat entire rows in a data-frame based on the samples
column.
我的输入:
df <- 'chr start end samples
1 10 20 2
2 4 10 3'
df <- read.table(text=df, header=TRUE)
我的预期输出:
df <- 'chr start end samples
1 10 20 1-10-20-s1
1 10 20 1-10-20-s2
2 4 10 2-4-10-s1
2 4 10 2-4-10-s2
2 4 10 2-4-10-s3'
知道如何明智地执行它吗?
Some idea how to perform it wisely?
答
我们可以使用 expandRows
根据 'samples' 列中的值展开行,然后转换为 data.table
,按 'chr' 分组,我们使用 sprintf
将列与行序列粘贴在一起以更新 'samples' 列.
We can use expandRows
to expand the rows based on the value in the 'samples' column, then convert to data.table
, grouped by 'chr', we paste the columns together along with sequence of rows using sprintf
to update the 'samples' column.
library(splitstackshape)
setDT(expandRows(df, "samples"))[,
samples := sprintf("%d-%d-%d-%s%d", chr, start, end, "s",1:.N) , chr][]
# chr start end samples
#1: 1 10 20 1-10-20-s1
#2: 1 10 20 1-10-20-s2
#3: 2 4 10 2-4-10-s1
#4: 2 4 10 2-4-10-s2
#5: 2 4 10 2-4-10-s3
注意:data.table
将在我们加载 splitstackshape
时加载.
NOTE: data.table
will be loaded when we load splitstackshape
.