如何解析具有多个表的页面

问题描述:

关于如何抓取具有多个表的网页的任何想法? 我正在连接到网页

Any idea on how to scrape a web page with multiple tables? I am connecting to the web page

这是一个表,但是在同一网页上有多个表

This is one table but on the same web page there are multiple tables

我也想不通如何阅读表格...

I also cant figure out how to read the table...

XML:

    <p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p> 
<div class="storyStats"> 
<table> 
<thead> 
<tr> 
<th>RANK</th> 
<th>CENTRES</th> 
<th>TEAM</th> 
<th>POS</th> 
<th>GP</th> 
<th>G</th> 
<th>A</th> 
<th>PTS</th> 
<th>+/-</th> 
<th>PIM</th> 
<th>PPP</th> 
</tr> 
</thead> 
<tbody> 
<tr class="bg1"> 
<td>1.</td> 
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven&nbsp;Stamkos</a></td> 

<td>Tampa Bay</td> 
<td>C</td> 
<td align="right">81</td> 
<td align="right">50</td> 
<td align="right">51</td> 
<td align="right">101</td> 
<td align="right">-2</td> 
<td align="right">56</td> 
<td align="right">38</td> 
</tr> 


Iterator<Element> trSIter = doc.select("table")
            .iterator();
    while (trSIter.hasNext()) {
        Element trEl = trSIter.next().child(0);
        Elements tdEls = trEl.children();
        Iterator<Element> tdIter = tdEls.select("tr").iterator();
        System.out.println("><1><><"+tdIter);
        boolean firstRow = true;
        while (tdIter.hasNext()) {

            Element tr = (Element) tdIter.next();


            while (tdIter.hasNext()) {
                int tdCount = 1;
                Element tdEl = tdIter.next();
                //name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();

                Elements tdsEls = tdEl.select("td");
                System.out.println("><2><><"+tdsEls);
                Iterator<Element> columnIt = tdsEls.iterator();

                while (columnIt.hasNext()) {

                    Element column = columnIt.next();
                    switch (tdCount++) {
                    case 1:
                        name =column.select("a").first().text();

                        break;
                    case 2:
                        stat2 = Double.parseDouble(column.text());
                        break;
                    case 3:
                        stat3 = Double.parseDouble(column.text());
                        break;
                    case 4:
                        stat4 = Double.parseDouble(column.text());
                        break;
                    case 5:
                        stat5 = Double.parseDouble(column.text());
                        break;
                    case 6:
                        stat6 = Double.parseDouble(column.text());
                        break;
                    case 7:
                        stat7 = Double.parseDouble(column.text());
                        break;
                    case 8:
                        stat8 = Double.parseDouble(column.text());
                        break;

这应该可以帮助您入门.每个表都有一个空白记录,您将不得不考虑.您还必须找出所需的统计信息以及它们在表中的位置.您可以通过tds.get()获得统计信息.让我知道它如何为您工作.

This should get you started. Each table has a blank record you will have to account for. You will also have to figure out which stats you want and where they are in the table. You get the stats with tds.get(). Let me know how it works for you.

    Document doc = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815").get();

    for (Element table : doc.select("div.storyStats").select("table")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 0) {
                System.out.println(tds.get(1).text() + ":" + tds.get(5).text());
            }
        }
    }