将字符串转换为R中的日期

问题描述:

我要转换的数据应该是日期,但其格式为mmddyyyy,不能用破折号或斜杠分隔。为了使用R中的日期,我希望将此格式设置为mm-dd-yyyy或mm / dd / yyyyy。

The data I'm trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I would like to have this formatted as mm-dd-yyyy or mm/dd/yyyy.

我想我可能需要使用 grep(),但是我不确定如何使用它重新格式化所有mmddyyyy格式的日期。

I think I might need to use grep(), but I'm not sure how to use it to reformat all of the dates that are in the mmddyyyy format.

已更新:通过 @进行了改进理查德·斯克里文(Richard Scriven) colClasses 和更简单的 as.Date()建议

Updated: Improved with @Richard Scriven's colClasses and simpler as.Date() suggestions

这里有两种对我有用的方法,从包含 mmddyyyy 格式日期的csv到被R识别为日期对象的csv 。

Here are two similar methods that worked for me, going from a csv containing mmddyyyy format date, to getting it recognized by R as a date object.

首先从一个简单的文件tv.csv开始:

Starting first with a simple file tv.csv:

Series,FirstAir
Quantico,09272015
Muppets,09222015



方法1:全部作为字符串



在R内一次,

Method 1: All as string

Once within R,

> t = read.csv('tv.csv', colClasses = 'character')




  • 导入 tv.csv 作为名为 t
  • 的数据框
  • colClasses ='character')选项会使所有数据被视为 character 数据类型(而不是 Factor int 类型)

    • imports tv.csv as a data frame named t
    • colClasses = 'character') option causes all the data to be considered the character data type (instead of being Factor, int types)
    • 检查其初始结构:

> str(t)
'data.frame':   2 obs. of  2 variables:
 $ Series  : chr  "Quantico" "Muppets"
 $ FirstAir: chr  "09272015" "09222015"




  • R已将所有字符导入为字符串,在此表示为 chr

    • R has imported all as strings of characters, indicated here as type chr
    • 然后轻松地将 chr 或字符串转换为日期:

      The chr or string of characters are then easily converted into a date:

> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")




  • as.Date()执行字符串到日期的转换

  • %m% d%Y 指定如何解释 t $ FirstAir 中的输入。这些格式代码,至少在Linux上,可以通过运行 $ man date 找到,这会在 date 上提供手册。 >程序,其中包含格式代码列表。例如,它说%m月(01..12)

    • as.Date() performs string to date conversion
    • %m%d%Y specifies how to interpret the input in t$FirstAir. These format codes, at least on Linux, can be found with running $ man date which brings up the manual on the date program, where there is a list of formatting codes. For example it says %m month (01..12)
    • 如果由于某种原因您不希望将所有字符全部转换,例如,一个包含多个变量的文件,并且希望保留R的自动类型识别功能,但仅修复一个日期变量,请遵循此方法。

      If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.

      一旦在R内,

> t = read.csv('tv.csv')




  • 导入 tv.csv 作为名为 t

    • imports tv.csv as a data frame named t
    • 检查其初始结构:

> str(t)
'data.frame':   2 obs. of  2 variables:
 $ Series  : Factor w/ 2 levels "Muppets","Quantico": 2 1
 $ FirstAir: int  9272015 9222015
>




  • R会尽力猜测每个变量的变量类型

  • 您可以立即看到一个问题,因为 FirstAir 变量R导入了 09272015 因为 int 表示整数,并且删除了前导零填充,所以09中的0对于稍后的日期转换很重要,而R却没有导入。因此,我们需要解决此问题。

    • R tries its best to guess the variable type per variable
    • As you can see an immediate problem is, for FirstAir variable R has imported 09272015 as int meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.
    • 这可以在一个命令中完成,但为清楚起见,我将其分为两个步骤。首先,

      This can be done in a single command but for clarity I have broken this into two steps. First,

> t$FirstAir = sprintf("%08d", t$FirstAir)




  • sprintf 是一种格式化函数

  • 0 表示填充为零

  • 8 表示确保8个字符,因为mmddyyyyy总共8个字符

  • d 在输入为数字(当前为数字)时使用,请回想 str()输出要求 t $ FirstAir int 的意思是整数

  • t $ FirstAir 是我们正在设置并用作输入的变量

    • sprintf is a formatting function
    • 0 means pad with zeroes
    • 8 means ensure 8 characters, because mmddyyyy is total 8 characters
    • d is used when the input is a number, which currently it is, recall str() output claimed the t$FirstAir is an int meaning integer
    • t$FirstAir is the variable we are both setting and using as input
    • 检查结果:

> str(t$FirstAir)
 chr [1:2] "09272015" "09222015"




  • 已成功将其从 int 转换为 chr 类型,例如 9272015 变为 09272015

    • it successfully converted from an int to a chr type, for example 9272015 became "09272015"
    • 现在它是字符串或 chr 类型,然后我们可以进行转换,与方法1相同。

      Now it is a string or chr type we can then convert, same as method 1.

> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))



结果



我们做最后检查:

Result

We do a final check:

> str(t$FirstAir)
 Date[1:2], format: "2015-09-27" "2015-09-22"

在这两种情况下,文本文件中的原始值现在都已成功转换为R日期对象。

In both cases, what were original values in a text file are have now been successfully converted into R date objects.