真实世界的问题:
我有我的应用程序托管在 Heroku的,谁(据我所知)无法提供一个解决方案,用于运行一个无头(无图形界面的)浏览器 - 如 的HtmlUnit - 生成的 HTML快照以Googlebot的索引我的AJAX的内容。
I have my app hosted on Heroku, who (to my knowledge) are unable to offer a solution for running a Headless (GUI-less) Browser - such as HTMLUnit - for generating HTML Snapshots for Googlebot to index my AJAX content.
我提出的解决方案:
如果您还没有准备好,我建议您阅读谷歌的完整规范制作Ajax应用程序抓取。的
想象一下,我有:
http://example.com
的http://example.com#选项卡=塔巴和放大器;子选项卡= SubTab3
然后客户端Javascript取的location.hash
和负载在塔巴,通过AJAX SubTab3内容。http://example.com
http://example.com#!tab=TabA&subtab=SubTab3
then client-side Javascript takes the location.hash
and loads in TabA, SubTab3 content via AJAX.注:哈希邦(#!)是的一部分谷歌规范。的
我想建立一个简单的Web服务托管在谷歌的App Engine (GAE)说:
I would like to build a simple "web service" hosted on Google App Engine (GAE) that:
http://htmlsnapshot.appspot.com?url=http://example.com#!tab=TabA&subtab=SubTab3
(URL参数应该是URLEn codeD) HTTP:!//example.com#标签=塔巴和放大器;子选项卡= SubTab3
并运行客户端JavaScript上断绝李>
http://htmlsnapshot.appspot.com?url=http://example.com#!tab=TabA&subtab=SubTab3
(url param should be URLEncoded)http://example.com#!tab=TabA&subtab=SubTab3
and run the client-side javascript on the sever.我的 http://example.com
应用程序将需要管理的呼叫 http://htmlsnapshot.appspot.com
...基本上是:
My http://example.com
app would need to manage the call to http://htmlsnapshot.appspot.com
... basically:
http://example.com/?_escaped_fragment_=tab=TabA%26subtab=SubTab3
(Googlebot抓取工具逃脱某些字符,比如%26 =安培; )。 http://htmlsnapshot.appspot.com?url=http://example.com#!tab=TabA&subtab=SubTab3
(URL参数应该是URLEn codeD)http://example.com/?_escaped_fragment_=tab=TabA%26subtab=SubTab3
(googlebot crawler escapes certain characters e.g. %26 = &).http://htmlsnapshot.appspot.com?url=http://example.com#!tab=TabA&subtab=SubTab3
(url param should be URLEncoded)我没有与谷歌应用程序引擎或Java或任何的HtmlUnit经验。
I don't have any experience with Google App Engine or Java or HTMLUnit.
我也许能推测出来......并且将在我的结果,如果我做的。
I might be able to figure it out... and will post my results if I do.
否则,我觉得这是一个非常好的机会人写一个踢屁股的博客文章,概述新手一步一步的指导,以建立一个Web服务是这样的。
Otherwise I feel this is a VERY good opportunity for someone to write a kick-ass blog post that outlines a novices step-by-step guide to setting up a web service like this.
这将引入更多的人,以优良的(和免费!)谷歌应用程序引擎。它也将undoubtably鼓励更多的人采用谷歌的规格进行抓取AJAX内容......这是我们可以从所有的利益!
This will introduce more people to the excellent (and free!) Google App Engine. Also it will undoubtably encourage more people to adopt Google's specs for crawlable AJAX content... something we can all benefit from!
随着谷歌的规格涨幅更接受建立一个无头的浏览器的障碍将发送许多的开发者谷歌搜索的答案!现在获取与名利和荣耀的答案! (编辑:最起码我会唱你的赞美)。
As Google's specification gains more acceptance the "hurdle" of setting up a Headless Browser is going to send many devs Googling for answers! Get in now with an answer for fame and glory! (edit: at the very least I will sing your praises).
打我的Twitter @_ chrisjacob
如果你想讨论的解决方案。
Hit me up on twitter @_chrisjacob
if you would like to discuss solutions.