如何在同一图上用 ggplot2 绘制两个变量的直线?

如何在同一图上用 ggplot2 绘制两个变量的直线?

问题描述:

这是一个很新的问题,但我有这样的数据:

test_data <-
  data.frame(
    var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
    var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
    date = seq(as.Date("2002-01-01"), by="1 month", length.out=100)
  )

如何用ggplot2、同时借助x-axis上的数据,来绘制var0和var1这两个时间序列?如果你能用不同的颜色加以区分还是一个加分项! 我相信这很简单,但我找不到任何例子。

For a small number of variables, you can use build up the plot manually yourself:

ggplot(test_data, aes(date)) + 
  geom_line(aes(y = var0, colour = "var0")) + 
  geom_line(aes(y = var1, colour = "var1"))

Using your data:

test_data <- data.frame(
var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
Dates = seq.Date(as.Date("2002-01-01"), by="1 month", length.out=100))

I create a stacked version which is what ggplot() would like to work with:

stacked <- with(test_data,
                data.frame(value = c(var0, var1),
                           variable = factor(rep(c("Var0","Var1"),
                                                 each = NROW(test_data))),
                           Dates = rep(Dates, 2)))

In this case producing stacked was quite easy as we only had to do a couple of manipulations, but reshape() and the reshape and reshape2 might be useful if you have a more complex real data set to manipulate.

Once the data are in this stacked form, it only requires a simple ggplot() call to produce the plot you wanted with all the extras (one reason why higher-level plotting packages like lattice and ggplot2 are so useful):

require(ggplot2)
p <- ggplot(stacked, aes(Dates, value, colour = variable))
p + geom_line()

I'll leave it to you to tidy up the axis labels, legend title etc.

HTH

The general approach is to convert the data to long format (using melt() from package reshape or reshape2) or gather() from the tidyr package:

library("reshape2")
library("ggplot2")

test_data_long <- melt(test_data, id="date")  # convert to long format

ggplot(data=test_data_long,
       aes(x=date, y=value, colour=variable)) +
       geom_line()

ggplot2 output

You need the data to be in "tall" format instead of "wide" for ggplot2. "wide" means having an observation per row with each variable as a different column (like you have now). You need to convert it to a "tall" format where you have a column that tells you the name of the variable and another column that tells you the value of the variable. The process of passing from wide to tall is usually called "melting". You can use tidyr::gather to melt your data frame:

library(ggplot2)
library(tidyr)

test_data <-
  data.frame(
    var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
    var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
    date = seq(as.Date("2002-01-01"), by="1 month", length.out=100)
  )
test_data %>%
    gather(key,value, var0, var1) %>%
    ggplot(aes(x=date, y=value, colour=key)) +
    geom_line()

multiple series ggplot2

Just to be clear the data that ggplot is consuming after piping it via gather looks like this:

date        key     value
2002-01-01  var0    100.00000
2002-02-01  var0    115.16388 
...
2007-11-01  var1    114.86302
2007-12-01  var1    119.30996