Python:在</span>之后提取文本在< br/>之前

Python:在</span>之后提取文本在< br/>之前

问题描述:

这是我要处理的html文件:

Here is the html file I am going to handle:

<span class="pl">Countries:</span> USA <br/>
<span class="pl">Language:</span> English <br/>

这是我的python代码:

And here is my python code:

from bs4 import BeautifulSoup

record=[]
soup=BeautifulSoup(html)
spans=soup.find_all('span')
for span in spans:
   record.append(span.text)

我最后得到的是:

Countries: Language:

结果缺少一些重要信息:美国"和英语" 我如何获得短信?

The result miss some important information :"USA" and "English" How can I get the text?

使用

Use the .next_sibling notation:

soup.find("span", text="Countries:").next_sibling
soup.find("span", text="Language:").next_sibling