当与group_by()一起使用时,dplyr:lead()和lag()错误
我想在每个组中找到lead()和lag()元素,但是有一些错误的结果。
I want to find the lead() and lag() element in each group, but had some wrong results.
例如,数据是这样的: / p>
For example, data is like this:
library(dplyr)
df = data.frame(name=rep(c('Al','Jen'),3),
score=rep(c(100, 80, 60),2))
df
数据:
name score
1 Al 100
2 Jen 80
3 Al 60
4 Jen 100
5 Al 80
6 Jen 60
现在我试着找出每个人的lead()和lag()分数。
如果我使用arrange()排序,我可以得到正确答案:
Now I try to find out lead() and lag() scores for each person. If I sort it using arrange(), I can get the correct answer:
df %>%
arrange(name) %>%
group_by(name) %>%
mutate(next.score = lead(score),
before.score = lag(score) )
OUTPUT1:
Source: local data frame [6 x 4]
Groups: name
name score next.score before.score
1 Al 100 60 NA
2 Al 60 80 100
3 Al 80 NA 60
4 Jen 80 100 NA
5 Jen 100 60 80
6 Jen 60 NA 100
没有arrange(),结果是错误的:
Without arrange(), the result is wrong:
df %>%
group_by(name) %>%
mutate(next.score = lead(score),
before.score = lag(score) )
OUTPUT2:
Source: local data frame [6 x 4]
Groups: name
name score next.score before.score
1 Al 100 80 NA
2 Jen 80 60 NA
3 Al 60 100 80
4 Jen 100 80 60
5 Al 80 NA 100
6 Jen 60 NA 80
例如,第一行,Al的next.score应为60(第3行)。
E.g., in 1st line, Al's next.score should be 60 (3rd line).
任何人都知道为什么会发生?为什么安排()影响结果(值,而不仅仅是关于订单)?谢谢〜
Anybody know why this happened? Why arrange() affects the result (the values, not just about the order)? Thanks~
似乎你必须通过额外的参数来延迟和领先的功能。当我运行你的功能没有安排,但order_by添加,一切似乎都可以。
It seems you have to pass additional argument to lag and lead functions. When I run your function without arrange, but with order_by added, everything seems to be ok.
df %>%
group_by(name) %>%
mutate(next.score = lead(score, order_by=name),
before.score = lag(score, order_by=name))
输出:
name score next.score before.score
1 Al 100 60 NA
2 Jen 80 100 NA
3 Al 60 80 100
4 Jen 100 60 80
5 Al 80 NA 60
6 Jen 60 NA 100
我的sessionInfo():
My sessionInfo():
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.1
loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1 lazyeval_0.1.10 magrittr_1.5 parallel_3.1.1 Rcpp_0.11.5
[7] tools_3.1.1