python xpath怎么提取html中的如下的内容

【求助】python xpath如何提取html中的如下的内容？
如下的一段html：

<h2 class="Left-title" id="baseParamInfo">

    	    	    <a class="jishu jishu2" target="_self" href="http://product.cnmo.com/1622/1621678/canshu.shtml">图文式<em></em></a>

    	    <a class="jishu cur" target="_self" href="javascript:void(0);">列表式</a>

    	        LG G4产品概要</h2>

想提取其中的“正文”： LG G4产品概要

////////////////////////////////////////////////////////
////////////////////////////////////////////////////////

使用的python代码如下：

from lxml.html.soupparser import fromstring


content = #crawl page and read html


root = fromstring(content)


desc_list = root.xpath('//*[@id="baseParamInfo"]')


title = desc_list[0].text  # 问题：获取的title为空字符串

有些线索：
代码中获取的desc_list为一个list，
只有两个元素，分别对应html中的“图文式”和“列表式”两段，
但就是获取不到“LG G4产品概要”这段信息。

请问，我该如何获取“LG G4产品概要”这段信息啊？

------解决思路----------------------
下面的v是指desc_list[0].

help(v)显示:

引用

------解决思路----------------------
  tail

------解决思路----------------------
      Text after this element's end tag, but before the next sibling

------解决思路----------------------
      element's start tag. This is either a string or the value None, if

------解决思路----------------------
      there was no text.

------解决思路----------------------


------解决思路----------------------
  text

------解决思路----------------------
      Text before the first subelement. This is either a string or

------解决思路----------------------
      the value None, if there was no text.

所以text是第一个<a>之前的text, 当然是空的.

可以用下面的代码找出所有不在子节点中的text:



In [91]: print v.text + ''.join([child.tail for child in v])




                LG G4产品概要

python xpath怎么提取html中的如下的内容

相关推荐