通过lxml从根而不是元素开始进行xpath查找
问题描述:
我想在漂亮的汤中做同样的事情,找到 find_all
元素,并对其进行迭代以在每个迭代的元素中找到一些其他元素.即:
I want to do the same thing I do in beautiful soup, find_all
elements and iterate through them to find some other_elements in each iterated elements. i.e.:
soup = bs4.BeautifulSoup(source)
articles = soup.find_all('div', class='v-card')
for article in articles:
name = article.find('span', itemprop='name').text
address = article.find('p', itemprop='address').text
现在,我尝试在lxml中执行相同的操作:
Now I try to do the same thing in lxml:
tree = html.fromstring(source)
items = tree.xpath('//div[@class="v-card"]')
for item in items:
name = item.xpath('//span[@itemprop="name"]/text()')
address = item.xpath('//p[@itemprop="address"]/text()')
...但是这会查找树中的所有匹配项,无论它们是否在当前 item
下.我该如何处理?
...but this finds all matches in the tree, regardless of whether they are under the current item
. How can I approach this?
答
在后续查询中不要使用//
作为前缀,该查询明确要求查询从根开始比你当前的元素.而是使用 .//
进行相对查询:
Don't use //
as prefix in the follow-up queries, which explicitly asks the query to start from the root rather than your current element. Instead, use .//
for relative queries:
for item in tree.xpath('//div[@class="v-card"]'):
name = item.xpath('.//span[@itemprop="name"]/text()'
address = item.xpath('.//p[@itemprop="address"]/text()')