无法以表格形式从日期内容中获取日期

问题描述:

我用python结合selenium编写了一个脚本,以解析网页表中可用的一些日期.该表位于标题NPL Victoria Betting Odds下.表格数据位于ID tournamentTable之内.您可以在10 Aug 201811 Aug 201812 Aug 2018中看到三个日期.我希望根据下面的预期输出对其进行解析和排列.

I've written a script in python in combination with selenium to parse some dates available within a table in a webpage. The table is located under the header NPL Victoria Betting Odds. The tabular data are within the id tournamentTable. You can see the three dates there 10 Aug 2018,11 Aug 2018 and 12 Aug 2018. I wish to parse and arrange them according to my expected output below.

网页链接

这是我到目前为止的尝试:

This is my attempt so far:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

link = "find the link above"

def get_content(driver,url):
    driver.get(url)
    for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr"))):
        try:
            idate = items.find_element_by_css_selector("th span[class^='datet']").text
        except Exception: idate = ""
        try:
            itime = items.find_element_by_css_selector("td.table-time").text
        except Exception: itime = ""

        print(f'{idate}--{itime}')

if __name__ == '__main__':
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,10)
    try:
        get_content(driver,link)
    finally:
        driver.quit()

目前,我的输出如下:

--
10 Aug 2018--
--
--09:30
--10:15
11 Aug 2018--
--
--05:00
--05:00
--09:00
12 Aug 2018--
--
--06:00
--06:00

我的预期输出:

10 Aug 2018--09:30
10 Aug 2018--10:15
11 Aug 2018--05:00
11 Aug 2018--05:00
11 Aug 2018--09:00
12 Aug 2018--06:00
12 Aug 2018--06:00

尝试使用以下代码:

def get_content(driver,url):
    driver.get(url)
    dates = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr.center.nob-border"))))
    for d in range(dates):
        item = driver.find_elements_by_css_selector("#tournamentTable tr.center.nob-border")[d]
        try:
            idate = item.find_element_by_css_selector("th span[class^='datet']").text
        except Exception: idate = ""
        for time_td in item.find_elements_by_xpath(".//following::td[contains(@class, 'table-time') and not((preceding::tr[@class='center nob-border'])[%d])]" % (d + 2)):
            try:
                itime = time_td.text
            except Exception: itime = ""
            print(f'{idate}--{itime}')