Python,漂亮的汤,获得所有类名
问题描述:
给出一个html代码,可以说:
given an html code lets say:
<div class="class1">
<span class="class2">some text</span>
<span class="class3">some text</span>
<span class="class4">some text</span>
</div>
如何检索所有的类名?即:['class1','class2','class3','class4']
How can I retrieve all the class names? ie: ['class1','class2','class3','class4']
我尝试过:
soup.find_all(class_=True)
但是它会检索整个标签,然后我需要对字符串做一些正则表达式
But it retrieves the whole tag and i then need to do some regex on the string
答
您可以在检索属性时,对作为 dictionary 找到的每个Tag
实例进行处理.请注意,由于class
是特殊的:
You can treat each Tag
instance found as a dictionary when it comes to retrieving attributes. Note that class
attribute value would be a list since class
is a special "multi-valued" attribute:
classes = []
for element in soup.find_all(class_=True):
classes.extend(element["class"])
或者:
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
演示:
In [1]: from bs4 import BeautifulSoup
In [2]: data = """
...: <div class="class1">
...: <span class="class2">some text</span>
...: <span class="class3">some text</span>
...: <span class="class4">some text</span>
...: </div>"""
In [3]: soup = BeautifulSoup(data, "html.parser")
In [4]: classes = [value
...: for element in soup.find_all(class_=True)
...: for value in element["class"]]
In [5]: print(classes)
['class1', 'class2', 'class3', 'class4']