Scrapy 不能同时使用 return 和 yield
这是我的代码
def parse(self, response):
soup = BeautifulSoup(response.body)
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="row"]')
items = []
for site in sites[:5]:
item = TestItem()
item['username'] = "test5"
request = Request("http://www.example.org/profile.php", callback = self.parseUserProfile)
request.meta['item'] = item
**yield item**
mylinks= soup.find_all("a", text="Next")
if mylinks:
nextlink = mylinks[0].get('href')
yield Request(urljoin(response.url, nextlink), callback=self.parse)
def parseUserProfile(self, response):
item = response.meta['item']
item['image_urls'] = "test3"
return item
现在我上面的工作,但我没有得到 item['image_urls'] = "test3"
Now my above works but with that i am not getting value of item['image_urls'] = "test3"
它是空的
现在如果使用 return request
而不是 yield item
Now if use return request
instead of yield item
然后得到 cannot use return with generator
如果我删除这一行
yield Request(urljoin(response.url, nextlink), callback=self.parse)
然后我的代码工作正常,我可以得到 image_urls
但我不能按照链接
yield Request(urljoin(response.url, nextlink), callback=self.parse)
Then my code works fine and i can get image_urls
but then i canot follow the links
那么有什么办法可以让我使用 return request
和 yield together
以便我得到 item_urls
So is there any way so that i can use return request
and yield together
so that i get the item_urls
看起来您遇到了机械错误.而不是:
Looks like you have a mechanical error. Instead of:
for site in sites[:5]:
item = TestItem()
item['username'] = "test5"
request = Request("http://www.example.org/profile.php", callback = self.parseUserProfile)
request.meta['item'] = item
**yield item**
您需要:
for site in sites[:5]:
item = TestItem()
item['username'] = "test5"
request = Request("http://www.example.org/profile.php", callback = self.parseUserProfile)
request.meta['item'] = item
yield request