如何将特定时间格式转换为R中的时间戳?
我正在从UCI处理本地化数据用于个人活动数据集数据集,在这个数据集中有一列日期和时间(一列),格式如下:
I am working on "Localization Data for Person Activity Data Set" dataset from UCI and in this data set there is a column of date and time(both in one column) with following format:
27.05.2009 14:03:25:777
27.05.2009 14:03:25:183
27.05.2009 14:03:25:210
27.05.2009 14:03:25:237
...
我想知道是否有使用R将此列转换为时间戳。
I am wondering if there is anyway to convert this column to timestamp using R.
首先,我们需要将从毫秒到毫秒的冒号分隔开来,否则最后一步将不起作用(感谢 Dirk Eddelbuettel )因为最终R将使用它想要的分隔符,要更快,我会继续,替换所有的冒号点:
First of all, we need to substitute the colon separating the milliseconds from the seconds to a dot, otherwise the final step won't work (thanks to Dirk Eddelbuettel for this one). Since in the end R will use the separators it wants, to be quicker, I'll just go ahead and substitute all the colons for dots:
x <- "27.05.2009 14:03:25:777" # this is a simplified version of your data
y <- gsub(":", ".", x) # this is your vector with the aforementioned substitution
顺便说一下,这是你的矢量应该照顾 gsub
:
By the way, this is how your vector should look after gsub
:
> y
[1] "27.05.2009 14.03.25.777"
现在,为了它显示毫秒,您首先需要调整R选项,然后使用一个名为 strptime
的函数,这将将您的日期向量转换为POSIXlt(一个R友好的)格式。只需执行以下操作:
Now, in order to have it show the milliseconds, you first need to adjust an R option and then use a function called strptime
, which will convert your date vector to POSIXlt (an R-friendly) format. Just do the following:
> options(digits.secs = 3) # this tells R you want it to consider 3 digits for seconds.
> strptime(y, "%d.%m.%Y %H:%M:%OS") # this finally formats your vector
[1] "2009-05-27 14:03:25.777"
我学到了这个很棒的技巧这里。 另一个答案也表示您可以跳过选项
设置并使用,例如, strptime(y,%d。%m。%Y%H:%M:%OS3)
,但对我来说不起作用。 Henrik 指出,功能的帮助页面?strptime
指出,%OS3
位是与操作系统相关的。我正在使用更新的Ubuntu 13.04,并使用%OS3
产生 NA
。
I've learned this nice trick here. This other answer also says you can skip the options
setting and use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3")
, but it doesn't work for me. Henrik noted that the function's help page, ?strptime
states that the %OS3
bit is OS-dependent. I'm using an updated Ubuntu 13.04 and using %OS3
yields NA
.
使用 strptime
(或其他与POSIX相关的功能,如 as.Date
)时,请保留考虑到使用的一些最常用的转换(根据 DWin 的建议,为了简洁而进行编辑。完整列表 strptime
):
When using strptime
(or other POSIX-related functions such as as.Date
), keep in mind some of the most common conversions used (edited for brevity, as suggested by DWin. Complete list at strptime
):
-
%a
当前语言环境中的缩位工作日名称。 -
%A
当前区域的全部工作日名称。 -
%b
当前区域设置中的缩写月份名称。
-
%B
当前区域设置中的全月份名称。 >
-
%d
以十进制数(01-31)为单位的日期。 -
%H
小时数为十进制数(00-23)。时间如24:00:00接受输入。 -
%I
小时数为十进制数(01-12)。 -
%j
年份为十进制数(001-366)。 -
%m
十进制数月份(01-12) -
%M
分钟作为十进制数(00-59)。 -
%p
本地区的AM / PM指标。与%I
而不是与%H
。 - ` %S第二个十进制数(00-61),允许最多两个闰秒(但POSIX兼容的实现将忽略闰秒)。
-
% U
使用星期日作为十进制数(00-53)的一周中的星期,作为一周的第一天(通常为第一周的第1个星期日)。 -
%w
平日为十进制数(0-6,星期日为0) -
%W
使用星期一作为十进制数(00-53)的一周中的星期几(通常与当年的第一个星期一)作为第1周的第1天)。 -
%y
没有世纪的年份(00-99)。在输入上,值00至68的前缀为20和69至99之间。19 -
%Y
具有世纪的年份。请注意,虽然原始公历中没有零,但ISO 8601:2004定义为有效(解释为1BC)
-
%a
Abbreviated weekday name in the current locale. -
%A
Full weekday name in the current locale. -
%b
Abbreviated month name in the current locale. -
%B
Full month name in the current locale. -
%d
Day of the month as decimal number (01–31). -
%H
Hours as decimal number (00–23). Times such as 24:00:00 are accepted for input. -
%I
Hours as decimal number (01–12). -
%j
Day of year as decimal number (001–366). -
%m
Month as decimal number (01–12). -
%M
Minute as decimal number (00–59). -
%p
AM/PM indicator in the locale. Used in conjunction with%I
and not with%H
. - `%S Second as decimal number (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
-
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention. -
%w
Weekday as decimal number (0–6, Sunday is 0). -
%W
Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention. -
%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 -
%Y
Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC)