从网页上抓取数据后无法生成一些自定义输出
问题描述:
我正在尝试将数据附加到字典中,同时从Webppage抓取数据.我目前所获得的输出不是我希望如何排列它们的输出.这是网页.
I'm trying to append data to a dictionary while scraping the same from a webppage. The output that I'm having at this moment is not how I wish to arrange them. This is the webpage.
我尝试过:
import requests
from bs4 import BeautifulSoup
from pprint import pprint
url = 'https://elllo.org/english/grammar/L1-01-AimeeTodd-Intros-BeVerb.htm'
data = []
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select("#transcript p"):
d = {}
if "Aimee:" in item.text:
d['Aimee'] = item.text.replace("Aimee:","").strip()
elif "Todd:" in item.text:
d['Todd'] = item.text.replace("Todd:","").strip()
data.append(d)
pprint(data)
输出结果如下:
[{'Aimee': 'So Todd, where are you from?'},
{'Todd': "I am from the U.S., I am from San Francisco. It's on the west "
'coast.'},
{'Aimee': 'And what do you do?'},
{'Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
'lot.'}
预期输出:
[{'Aimee': 'So Todd, where are you from?','Todd': "I am from the U.S., I am from San Francisco. It's on the west "
'coast.'},
{'Aimee': 'And what do you do?','Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
'lot.'},
如何产生第二个输出?
How can I produce the second output?
答
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
d = {}
for item in soup.select("#transcript p"):
if "Aimee:" in item.text:
d['Aimee'] = item.text.replace("Aimee:","").strip()
elif "Todd:" in item.text:
d['Todd'] = item.text.replace("Todd:","").strip()
data.append(d)
d = {}
pprint(data)