无法以表格形式从日期内容中获取日期
问题描述:
我用python结合selenium编写了一个脚本,以解析网页表中可用的一些日期.该表位于标题NPL Victoria Betting Odds
下.表格数据位于ID tournamentTable
之内.您可以在10 Aug 2018
,11 Aug 2018
和12 Aug 2018
中看到三个日期.我希望根据下面的预期输出对其进行解析和排列.
I've written a script in python in combination with selenium to parse some dates available within a table in a webpage. The table is located under the header NPL Victoria Betting Odds
. The tabular data are within the id tournamentTable
. You can see the three dates there 10 Aug 2018
,11 Aug 2018
and 12 Aug 2018
. I wish to parse and arrange them according to my expected output below.
这是我到目前为止的尝试:
This is my attempt so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
link = "find the link above"
def get_content(driver,url):
driver.get(url)
for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr"))):
try:
idate = items.find_element_by_css_selector("th span[class^='datet']").text
except Exception: idate = ""
try:
itime = items.find_element_by_css_selector("td.table-time").text
except Exception: itime = ""
print(f'{idate}--{itime}')
if __name__ == '__main__':
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
try:
get_content(driver,link)
finally:
driver.quit()
目前,我的输出如下:
--
10 Aug 2018--
--
--09:30
--10:15
11 Aug 2018--
--
--05:00
--05:00
--09:00
12 Aug 2018--
--
--06:00
--06:00
我的预期输出:
10 Aug 2018--09:30
10 Aug 2018--10:15
11 Aug 2018--05:00
11 Aug 2018--05:00
11 Aug 2018--09:00
12 Aug 2018--06:00
12 Aug 2018--06:00
答
尝试使用以下代码:
def get_content(driver,url):
driver.get(url)
dates = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr.center.nob-border"))))
for d in range(dates):
item = driver.find_elements_by_css_selector("#tournamentTable tr.center.nob-border")[d]
try:
idate = item.find_element_by_css_selector("th span[class^='datet']").text
except Exception: idate = ""
for time_td in item.find_elements_by_xpath(".//following::td[contains(@class, 'table-time') and not((preceding::tr[@class='center nob-border'])[%d])]" % (d + 2)):
try:
itime = time_td.text
except Exception: itime = ""
print(f'{idate}--{itime}')