python 第二周(第九天) 我的python成长记 一个月搞定python数据挖掘!(16) -scrapy框架

scrapy 框架

response的解析

>>> response.css('title::text').extract()
['Quotes to Scrape']

There are two things to note here:
  (1)one is that we’ve added ::text to the CSS query, to mean we want to select only the text elements directly inside <title> element. If we don’t specify ::text, we’d get the full title element, including its tags:  
  (2)the other thing is that the result of calling .extract() is a list, because we’re dealing with an instance of SelectorList. When you know you just want the first result, as in this case, you can do:
When you know you just want the first result, as in this case, you can do:
>>> response.css('title::text').extract_first()
'Quotes to Scrape'

Besides the extract() and extract_first() methods, you can also use the re() method to extract using regular expressions:
>>> response.css('title::text').re(r'Quotes.*')
['Quotes to Scrape']
>>> response.css('title::text').re(r'Qw+')
['Quotes']
>>> response.css('title::text').re(r'(w+) to (w+)')
['Quotes', 'Scrape']