无法弄清楚如何获取时间属性

问题描述:

我无法在Google电子表格中使用 ImportXML()函数.我想检索在多个电子表格中的Reddit帖子上发布的时间,但我只有2012年4月28日才有运气,而不是我想要的2012-04-28T02:19:06.348481 + 00:00.

I'm having trouble using the ImportXML() function in Google spreadsheets. I would like to retrieve the time of posting on several Reddit posts in a spreadsheet, but I'm only having luck getting 28 Apr 2012, instead of 2012-04-28T02:19:06.348481+00:00, which is what I would like.

例如,在此网页上,来源,并查看以下内容:

For example, on this web page, I look in the source and see the following:

<div class='spacer'><div class="linkinfo">
    <div class="date">
        <span>this post was submitted on &#32;</span>
            <time datetime="2012-04-28T02:19:06.348481+00:00">28 Apr 2012</time>
    </div>
<div class="score">

但是,这是我唯一可以做的事情:

However, this is the only line I can get to do anything:

=ImportXML(
"http://www.reddit.com/r/BuyItForLife/comments/jtjuz/bi4l_mission_statement_rules_etc/",
"//div[@class='date']")

有什么建议吗?我一直在搜索,尝试,搜索和尝试,但没有任何效果.

Any suggestions? I've been searching and trying and searching and trying, and nothing is working.

IMPORTXML要求文档遵循XML/XHTML才能正常工作.看起来文档不符合此要求,因此,使用IMPORTDATA而不是IMPORTXML,然后使用QUERY和REGEXEXTRACT.

IMPORTXML requires that the document follow the XML/XHTML to work correctly. Looks that the document doesn't comply with this, so, instead of IMPORTXML use IMPORTDATA and then use QUERY and REGEXEXTRACT.

示例:

A1:添加URL http://www.reddit.com/r/BuyItForLife/comments/jtjuz/bi4l_mission_statement_rules_etc/
A2:添加以下公式

A1: Add the URL http://www.reddit.com/r/BuyItForLife/comments/jtjuz/bi4l_mission_statement_rules_etc/
A2: Add the following formula

=REGEXEXTRACT(QUERY(transpose(QUERY(importdata(A1),,1E+100)),,1E+100),
"datetime=""(.*?)""")

结果: 2011-08-25T01:32:23 + 00:00

关于使用QUERY的说明:

Explanation about the use of QUERY:

IMPORTDATA返回2D数组.QUERY两次用于连接数组的内容,第一次将所有行放在一起,第二次对所有列进行相同的操作.

IMPORTDATA returns a 2D array. QUERY is used two times to concatenate the content of the array, the first time to put together all the rows, the second time to do the same for all the columns.

CONCATENATE和JOIN不能使用,因为它们的字符数限制为50000.

CONCATENATE and JOIN cannot be used because they have a 50000 character limit.