如何在 ggplot2 boxplot 中为每组添加多个观察值并使用组均值?
我正在做一个基本的箱线图,其中 y=age
和 x=Patient groups
I am doing a basic boxplot where y=age
and x=Patient groups
age <- ggplot(data, aes(factor(group2), age)) + ylim(15, 80)
age + geom_boxplot(fill = "grey80", colour = "#3366FF")
我希望你能帮我解决一些问题:
I was hoping you could help me out with a few things:
1) 是否可以在每个组箱线图上方(但不在我的组标签所在的 X 轴上)包含每个组的多个观察结果,而不必在油漆中执行此操作:)?我试过使用:
1) Is it possible to include a number of observations per group above each group boxplot (but NOT on the X axis where my group labels are) without having to do this in paint :)? I have tried using:
age + annotate("text", x = "CON", y = 60, label = "25")
其中 CON
是第一组,y = 60
是 ~ 就在该组的箱线图上方.但是,该命令不起作用.我认为它有一些事情要做,它将 x
读取为连续变量而不是分类变量.
where CON
is the 1st group and y = 60
is ~ just above the boxplot for this group. However, the command didn't work. I assume it has something to do that it reads x
as a continuous rather than a categorical variable.
2) 另外,尽管对于箱线图使用均值而不是中值有很多问题,但我仍然没有找到适合我的代码?
2) Also although there are plenty of questions about using the mean rather than the median for the boxplots, I still haven`t found a code that works for me?
3) 同样,有没有一种方法可以将平均组统计数据包含在箱线图中?也许使用
3) On the same matter is there a way you could include the mean group stat in the boxplot? Perhaps using
age + stat_summary(fun.y=mean, colour="red", geom="point")
然而,它只包含一个表示平均值所在的点.或者再次使用
which however only includes a dot of where the mean lies. Or again using
age + annotate("text", x = "CON", y = 30, label = "30")
其中 CON
是第一组,y = 30
是 ~ 组年龄平均值.知道 ggplot2
语法是多么灵活和丰富,我希望有一种更优雅的方式来使用真实的统计数据输出,而不是 annotate
.
where CON
is the 1st group and y = 30
is ~ the group age mean.
Knowing how flexible and rich ggplot2
syntax is I was hoping that there is a more elegant way of using the real stats output rather than annotate
.
任何建议/链接将不胜感激!
Any suggestions/links would be much appreciated!
谢谢!!
这与您想要的一样吗?使用 stat_summary
,按要求:
Is this anything like what you're after? With stat_summary
, as requested:
# function for number of observations
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
# function for mean labels
mean.n <- function(x){
return(c(y = median(x)*0.97, label = round(mean(x),2)))
# experiment with the multiplier to find the perfect position
}
# plot
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red")
黑色数字是观察次数,红色数字是平均值.joran 的回答向您展示了如何将数字放在方框的顶部
Black number is number of observations, red number is mean value. joran's answer shows you how to put the numbers at the top of the boxes