漂亮的汤还能打网页事件吗?
Beautiful Soup是一个Python库,用于从HTML和XML文件中提取数据.我将使用它来提取网页数据,但是我没有找到任何方法来单击按钮anchor label
,这些按钮在本例中是用于页面导航的.因此,为此我必须使用其他任何我不知道的功能或beautiful soup
.
Beautiful Soup is a Python library for pulling data out of HTML and XML files. I will use it to extract webpage data,but i didn't find out any way to click the buttons,anchor label
which are used in my case the page navigation. So for this shall I have to use any other or beautiful soup
has the capability i didn't aware of.
请给我建议!
要回答您的标签/评论,是的,您可以将它们一起使用(Selenium和BeautifulSoup),否,您不能直接使用BeautifulSoup执行事件(点击等).尽管我本人从未在相同的情况下一起使用过它们,但是一种假设的情况可能涉及使用Selenium通过某个路径导航到目标页面(即click()
这些选项,然后click()
下一页的按钮),然后使用BeautifulSoup读取driver.page_source
(其中driver
是您创建的用于驱动"浏览器的Selenium驱动程序).由于driver.page_source
是页面的HTML,因此您可以像往常一样使用BeautifulSoup,解析出所需的任何信息.
To answer your tags/comment, yes, you can use them together (Selenium and BeautifulSoup), and no, you can't directly use BeautifulSoup to execute events (clicking etc.). Although I myself haven't ever used them together in the same situation, a hypothetical situation could involve using Selenium to navigate to a target page via a certain path (i.e. click()
these options and then click()
the button to the next page), and then using BeautifulSoup to read the driver.page_source
(where driver
is the Selenium driver you created to 'drive' the browser). Since driver.page_source
is the HTML of the page, you can use BeautifulSoup as you are used to, parsing out whatever information you need.
简单的例子:
from bs4 import BeautifulSoup
from selenium import webdriver
# Create your driver
driver = webdriver.Firefox()
# Get a page
driver.get('http://news.ycombinator.com')
# Feed the source to BeautifulSoup
soup = BeautifulSoup(driver.page_source)
print soup.title # <title>Hacker News</title>
主要思想是,只要您需要阅读页面源代码,就可以将driver.page_source
传递给BeautifulSoup
以便阅读所需的内容.
The main idea is that anytime you need to read the source of a page, you can pass driver.page_source
to BeautifulSoup
in order to read whatever you want.