使用Java从html页面提取单个值:
我正在继续从事一段时间的项目,而我一直在努力从网站上获取一些数据.该网站有一个iframe,可从未知来源提取一些数据.数据在iframe中的代码中是这样的:
I am continuing work on a project that I've been at for some time now, and I have been struggling to pull some data from a website. The website has an iframe that pulls in some data from an unknown source. The data is in the iframe in a tag something like this:
<DIV id="number_forecast"><LABEL id="lblDay">9,000</LABEL></DIV>
上面有很多其他废话,但是这个div id/标签是完全唯一的,在代码中的其他任何地方都没有使用.
There is a BUNCH of other crap above it but this div id / label is totally unique and is not used anywhere else in the code.
jsoup 可能就是您想要的,它表现出色从HTML文档中提取数据.
jsoup is probably what you want, it excels at extracting data from an HTML document.
有许多示例显示了如何使用API: http://jsoup. org/cookbook/extracting-data/selector-syntax
There are many examples available showing how to use the API: http://jsoup.org/cookbook/extracting-data/selector-syntax
该过程将分为两个步骤:
The process will be in two steps:
- 解析页面并找到iframe的网址
- 解析iframe的内容并提取所需的信息
代码如下:
// let's find the iframe
Document document = Jsoup.parse(inputstream, "iso-8859-1", url);
Elements elements = document.select("iframe");
Element iframe = elements.first();
// now load the iframe
URL iframeUrl = new URL(iframe.absUrl("src"));
document = Jsoup.parse(iframeUrl, 15000);
// extract the div
Element div = document.getElementById("number_forecast");