


I've written a script in python to scrape the titles of different items located at the right sided area right next to the map of its landing page. There are two links I've used within my script: one has pagination and the other don't have.

执行脚本时,它首先检查分页链接.如果找到一个,则将链接传递到 get_paginated_info()函数以在此处打印结果.但是,如果找不到分页链接,则将汤对象传递给 get_info()函数,并在此打印结果.此刻的脚本正好按照我描述的方式工作.

When I execute my script, it first check for the pagination links. If it finds one then it passes the links to get_paginated_info() function to print result there. However, if it fails to find pagination links then it passes the soup object to get_info() function and prints the result there. The script at this moment works just exactly the way I described.

如何使我的脚本仅在链接具有分页或不符合我已经尝试应用的逻辑的情况下,在 get_info()函数中打印结果?我想从脚本中删除 get_paginated_info()函数吗?

How can I make my script print the result within get_info() function only whether the link has pagination or not complying with the logic I've already tried to apply as I wish to kick out get_paginated_info() function from my script?


This my attempt so far:

import requests 
from bs4 import BeautifulSoup
from urllib.parse import urljoin

urls = (

def get_names(link):
    r = requests.get(link)
    soup = BeautifulSoup(r.text,"lxml")
    items = soup.select_one(".pagination a.next_page")
    if items:
        npagelink = items.find_previous_sibling().get("href").split("/")[-1]
        return [get_paginated_info(link + "/page/{}".format(page)) for page  in range(1,int(npagelink)+1)]

        return [get_info(soup)]

def get_info(soup):
    print("================links without pagination==============")
    for items in soup.select("td[class='table-row-price']"):
        item = items.select_one("h2 a").text

def get_paginated_info(url):
    r = requests.get(url)
    sauce = BeautifulSoup(r.text,"lxml")
    print("================links with pagination==============")
    for content in sauce.select("td[class='table-row-price']"):
        title = content.select_one("h2 a").text

if __name__ == '__main__':
    for url in urls:


Any better design capable of dealing with different liks will be highly appreciated.

我稍微改变了逻辑.因此,现在无论在有分页的情况下还是没有分页的情况下,脚本都将调用 get_names .但是在 for 循环的第二种情况下,只会执行一次迭代

I slightly have changed the logic. So now both in cases when Pagination is there and when there is no Pagination script will call get_names. But in second case in for loop only one iteration will be executed

import requests 
from bs4 import BeautifulSoup
from urllib.parse import urljoin

urls = (

def get_names(link):
    r = requests.get(link)
    soup = BeautifulSoup(r.text,"lxml")
    items = soup.select_one(".pagination a.next_page")
        npagelink = items.find_previous_sibling().get("href").split("/")[-1]
    except AttributeError:
        npagelink = 1
    return [get_info(link + "/page/{}".format(page)) for page in range(1, int(npagelink) + 1)]

def get_info(url):
    r = requests.get(url)
    sauce = BeautifulSoup(r.text,"lxml")
    for content in sauce.select("td[class='table-row-price']"):
        title = content.select_one("h2 a").text

if __name__ == '__main__':
    for url in urls:


Please double-check the output to be sure that everything works as expected