浏览器自动化和跨站脚本
我试图写一些基于网络的自动化。我打的网站不在同一个域作为我的自动化,使跨站点脚本问题使其无法访问目标网站上的DOM。
I'm trying to write some web-based automation. The sites I'm hitting aren't on the same domain as my automation, so cross-site scripting issues make it impossible to access the DOM on the target website.
我不希望使用代理或处理proxifying目标网站(如硒做,例如)。跨平台是不错的,但不是必须的。我会去仅适用于Windows,如果我不得不。
I don't want to use a proxy or deal with proxifying the target websites (like Selenium does, for example). Cross-platform is a nice to have, but isn't a must. I'll go Windows only if I'm forced to.
我知道我可以编写一些简单的WebBrowser控件和我自己的一套脚本的Windows程序,但我不希望我的用户不必从我的网页上下载一个EXE,或任何注册表替代禁用跨域检查。它是非常容易使用,无需额外的软件下载或任何东西。
I realize I could simply write a Windows program that runs a WebBrowser control and my own set of scripts, but I don't want my users having to download an EXE from my webpage, or any registry overrides to disable cross-domain checking. It has to be extremely easy to use, no extra software downloads or anything.
我试着写一个ActiveX控件,其中包括MS WebBrowser控件,所以我可以有一个浏览器在一个浏览器,可以这么说。这并没有工作。我结束了winocc.cpp断言失败。
I tried to write an ActiveX control which includes the MS WebBrowser control, so I could have a "browser-in-a-browser", so to speak. This didn't work. I ended up with winocc.cpp assertion failures.
我有什么其他选择?将Java小程序的工作?我需要一个基于Java的浏览器。我将不得不看看使用JRex或路宝?
What other options do I have? Would a Java applet work? I'd need a Java-based browser.. would I have to look at using JRex or Lobo?
有刚刚得到了一个更好的办法。
There has just got to be a better way.
您可以使用服务器端语言来获得使用屏幕刮外部页面。我做这个使用PHP以及在C#.NET,但你可以使用pretty得多任何服务器端语言,使从目标页面返回HTML全大块的Web请求。
You could use a server-side language to obtain the external page using a screen scrape. I've done this using PHP and also in C#.NET, but you could use pretty much any server side language to make a web request that returns the whole chunk of HTML from the target page.
一旦你的HTML,你可以做你想做的它,因为它只是一个字符串,你会以某种方式来操作,然后在页面上书写。
Once you have the HTML, you can do what you want with it, as it's just a string that you're going to manipulate in some way and then write on your page.