使用C#从HTML TBODY提取数据
问题描述:
我使用C#Web.Client下载HTML字符串。
I am using c# Web.Client to download an html string.
被返回的HTML的一个小例子是
A small example of the html been returned is
<tbody class='resultBody ' id='Tbody2'>
<tr id='Tr2' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='452' id='Checkbox2' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td2'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
123 ABC St</dd>
</dl>
</td>
</tr>
</tbody>
<tbody class='resultBody ' id='Tbody3'>
<tr id='Tr3' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='99' id='Checkbox3' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td3'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
1111 Some St</dd>
</dl>
</td>
</tr>
</tbody>
我感兴趣的是HTML的2个元素,但我也没办法让他们的最佳方式。怎么会是我得到的价值,并从元素
I am interested in 2 elements of the html but I have no idea the best way to get to them. How would be the best way for me to get the value from and get the inner html from the element
任何建议将是巨大的!
答
- 下载 HTML敏捷性包(免费)
- 创建一个新的HTMLDocument
- loadhtml
- 使用DOM导航或XPath查询(的SelectSingleNode等)来查找元素
- 您想要的元素访问InerHtml
- download the HTML Agility Pack (free)
- create a new HtmlDocument
- loadhtml
- use DOM navigation or an xpath query (SelectSingleNode etc) to find the elements
- access InerHtml of the elements you want
的API是类似XmlDocument的,但它适用于HTML不是XHTML。
The API is similar to XmlDocument, but it works on html that isn't xhtml.