将总和行拆分为 R 中的各个行

问题描述:

我有一组数据,其中我将特定值的年度总计存储在一行中(观察).我想整理 R 中的数据,以便使用一个简单的等式 (total/12) 将每个月的总行分开,该等式将年度总计除以 12 行中的每一行的 12 作为每月总计.我正在尝试在 R 中执行此操作,但我是初学者并且不太确定从哪里开始.示例如下:

I have a set of data where I have annual totals for specific values stored in one row (observation). I'd like to tidy the data in R so that this total row is broken out for each month using a simple equation (total/12) that stores the annual total divided by 12 in each of 12 rows as a monthly total. I'm trying to do this in R but am very beginner and not quite sure where to start. Example is below:

Date | Total
2015 | 12,000 

Some R function to convert to:

   Date    | Total
01-01-2015 | 1,000
02-01-2015 | 1,000
03-01-2015 | 1,000

以下是一个包​​含多年的示例数据集,我认为这是您想要的用例:

Here is an example dataset with multiple years, which I presume is your desired use case:

myData <-
  data.frame(
    Date = 2011:2015
    , Total = (1:5)*1200
  )

然后,我按 Date 列拆分 data.frame(假设年份是唯一的),使用 lapply 循环修改为您的每月值想要,然后将它们与 dplyr 中的 bind_rows 缝合在一起(还需要为管道加载 dplyrmagrittr(%>%) 工作).

Then, I split the data.frame up by the Date column (this assumes the years are unique), looped through with lapply to modify to the monthly values you wanted, then stitched them back together with bind_rows from dplyr (also need to load dplyr or magrittr for the pipes (%>%) to work).

myData %>%
  split(.$Date) %>%
  lapply(function(x){
    temp <- data.frame(
      Date = as.Date(paste(x$Date
                           , formatC(1:12, width = 2, flag = "0")
                           , "01"
                           , sep = "-"))
      , Total = x$Total / 12
    )
  }) %>%
  bind_rows()

给出(注意,使用 ... 表示跳过的行)

Gives (note, using ... to indicate skipped rows)

         Date Total
1  2011-01-01   100
2  2011-02-01   100
3  2011-03-01   100
...
16 2012-04-01   200
17 2012-05-01   200
18 2012-06-01   200
...
29 2013-05-01   300
30 2013-06-01   300
31 2013-07-01   300
...
43 2014-07-01   400
44 2014-08-01   400
45 2014-09-01   400
...
58 2015-10-01   500
59 2015-11-01   500
60 2015-12-01   500

如果按照评论中的建议,您需要每年除以不同的月份数,我将创建一个以年份命名的除数向量.在这里,我使用 dplyr 作为 n_distinctmagrittr 管道来简化常见情况的命名.然后,添加/更改不同年份的条目:

If, as suggested in the comments, you need to divide by a different number of months each year, I would create a vector of the divisors named for the years. Here, I am using dplyr for n_distinct and the magrittr pipe to ease naming of the common case. Then, add/change the entry for the different year:

toDivide <-
  rep(12, n_distinct(myData$Date)) %>%
  setNames(sort(unique(myData$Date)))

toDivide["2016"] <- 9

然后,您可以使用此向量代替上面的 12.所以,替换这个:

Then, instead of 12 as above, you can use this vector. So, replace this:

Total = x$Total / 12

lapply 循环中,用这个:

within the lapply loop, with this:

Total = x$Total / toDivide[as.character(x$Date)]

它应该除以那一年的条目,而不是总是除以相同的东西.

and it should divide by the entry for that year, instead of always dividing by the same thing.