在虚拟机上按条件在数据帧行中添加列总和
我想一次添加一行数据框各列的总和,但要以具有二进制变量的另一列为条件.
I would like to add the sums of the columns of my dataframe one row at a time, conditional on another column that has a binary variable.
因此,对于每一行,我想为所有行(对应行中的二进制变量具有相同的值)计算其上方整列的总和.
So for each row, I would like to compute the sum of the entire column above it for all rows where the binary variable in the corresponding row has the same value.
这里是一个例子:
dummy var1 var2
1 x1 y1
0 x2 y2
0 x3 y3
1 x4 y4
我的目标是获得这个:
dummy var1 var2
1 x1 y1
0 x2 y2
0 x3+x2 y3+y2
1 x4+x1 y4+y1
我之前曾问过这个问题的简化版本(在datawise rowwise ),我只是在没有条件的情况下添加了以上所有值.有没有办法合并这种情况?
I have asked this question previously for a simplified version (Adding columns sums in dataframe row wise) where I just add all of the values above without the condition. Is there a way to incorporate this condition?
data.table::rleid
将为您提供所需的分组.如果将数据框转换为data.table,则如下所示:
data.table::rleid
will give you the grouping you want. If you convert your data frame to a data.table, it's like this:
(注意:这是假设您的文本正确且示例不正确:在dummy
列中按连续个相等的值分组.)
(Note: this assumes that your text is accurate and your example incorrect: it groups by consecutive equal values in the dummy
column.)
library(data.table)
setDT(your_data)
your_data[, id := rleid(dummy)][
, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = id
]
如果需要对一堆列执行此操作,请按上述设置id
,定义列向量,然后:
If you need to do this to a bunch of columns, set the id
as above, define your vector of columns, and then:
cols = c("var1", "var2", "var3", ...)
your_data[, (cols) := lapply(.SD, cumsum), by = id, .SD = cols]
If you just want to group by the dummy column, ignoring consecutiveness, then your question is an exact duplicate of this one, and you can do it like this:
setDT(your_data)
your_data[, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = dummy]