按日期对日期时间数据进行排序,但从4PM到4PM
我每天都有关于公司的各种推文,我希望每天将它们归类.我已经做到了.但是,我不想将它们从00:00到23:59排序,而是从16:00到15:59(因为纽约证券交易所开放时间)进行排序.
I have Tweets from various times a day about companies, and I want to group them all by day. I have already done this. However, I want to sort them not from 00:00 until 23:59, but instead from 16:00 until 15:59 (because of the NYSE open hours).
推文(负面,中立和正面代表情感):
Tweets (Negative, Neutral and Positive is for the sentiment):
Company,Datetime_UTC,Negative,Neutral,Positive,Volume
AXP,2013-06-01 16:00:00+00:00,0,2,0,2
AXP,2013-06-01 17:00:00+00:00,0,2,0,2
AXP,2013-06-02 05:00:00+00:00,0,1,0,1
AXP,2013-06-02 16:00:00+00:00,0,2,0,2
我的代码:
Tweets$Datetime_UTC <- as.Date(Tweets$Datetime)
Sent <- aggregate(list(Tweets$Negative, Tweets$Neutral, Tweets$Positive), by=list(Tweets$Company, Tweets$Datetime_UTC), sum)
colnames(Sent) <- c("Company", "Date", "Negative", "Neutral", "Positive")
Sent <- Sent[order(Sent$Company),]
该代码的输出:
Company,Date,Negative,Neutral,Positive
AXP,2013-06-01,0,4,0
AXP,2013-06-02,0,3,0
我希望如何(考虑一天应该从16:00开始):
How I'd want it to be (considering that a day should start at 16:00):
Company,Date,Negative,Neutral,Positive
AXP,2013-06-02,0,5,0
AXP,2013-06-03,0,2,0
如您所见,我的代码几乎可以正常工作.我只想按不同的时间窗口进行排序.
As you can see, my code almost works. I just want to sort after different time windows.
如何执行此操作?一个想法是将+ 8h加到每个Datetime_UTC
上,这会将16:00更改为00:00.在此之后,我可以只使用我的代码.有可能吗?
How to do this? One idea would be to just add +8h to every single Datetime_UTC
, which would change 16:00 into 00:00. After this, I could just use my code. Would that be possible?
提前谢谢!! :-)
有效地,您正在做的是将日期重新定义为16:00,而不是00:00.一种选择是转换为纪元时间(距1970:01:01 00:00:00+00:00
的秒数,然后将数据向前滑动八小时.
Effectively what you're doing is redefining a date to start at 16:00 instead of 00:00. One option would be to convert to epoch time (seconds since 1970:01:01 00:00:00+00:00
and simply slide your data forward by eight hours.
您可以转换为纪元秒,然后再增加8个小时的秒数,然后全部转换回Date
类.然后,您将像以前一样聚合.
You can convert to epoch seconds, then add 8 hours worth of seconds, and then convert back to Date
class all in one line. Then you would just aggregate as you had been.
Tweets$Datetime_UTC <- as.Date(as.integer(as.POSIXct(Tweets)) + 28800)
用它替换第一行代码,它应该可以解决问题.
Replace your first line of code with that and it should do the trick.