用Python从网页中提取内容(title,time和url)到list

用Python从网页中提取内容(title,time和url)到list

问题描述:

img


import requests
from bs4 import BeautifulSoup

url = requests.get('http://money.163.com/special/pinglun/')
html = url.content
soup = BeautifulSoup(html, 'html.parser')
extract163Data(soup)

list1 = [{'title': '贾跃亭的成功意味着实体失败?', 'time': '2016-04-25 14:28:18', 'url': 'http://money.163.com/16/0425/14/BLGM1PH5002551G6.html'}, {'title': '海尔模式为何在西方叫好不叫座', 'time': '2016-04-22 15:00:23', 'url': 'http://money.163.com/16/0422/15/BL90MCB400253G87.html'}, {'title': '有前科就不能开网约车?', 'time': '2016-04-12 15:30:49', 'url': 'http://money.163.com/16/0412/15/BKFAETGB002552IJ.html'}, {'title': '影业公司能助网络视频抬身价吗', 'time': '2016-03-31 13:43:27', 'url': 'http://money.163.com/16/0331/13/BJG7HME600253G87.html'}, {'title': '美的收购东芝究竟值不值?', 'time': '2016-03-31 08:48:45', 'url': 'http://money.163.com/16/0331/08/BJFMM2AB00253G87.html'}, {'title': '日本家电企业真的不行了吗?', 'time': '2016-03-18 16:40:02', 'url': 'http://money.163.com/16/0318/16/BIF2FM7A002551G6.html'}, {'title': '淘宝只是中国制造乱象的镜子', 'time': '2016-03-16 09:56:58', 'url': 'http://money.163.com/16/0316/09/BI96K6L000253G87.html'}, {'title': 'iPhone 6s太失败? 苹果需创新', 'time': '2016-01-26 14:45:14', 'url': 'http://money.163.com/16/0126/14/BE8V83A500253G87.html'}, {'title': '从贴吧事件看大公司如何担责', 'time': '2016-01-18 16:02:05', 'url': 'http://money.163.com/16/0118/16/BDKGF2C000253G87.html'}, {'title': '销量不佳股价跌 苹果错在哪里', 'time': '2016-01-11 14:49:43', 'url': 'http://money.163.com/16/0111/14/BD2BHH85002551G6.html'}, {'title': '视频网站为何对快播痛下杀手?', 'time': '2016-01-11 14:30:31', 'url': 'http://money.163.com/16/0111/14/BD2AEC0E002551G6.html'}, {'title': '黎万强重振小米是个伪命题?', 'time': '2016-01-05 13:51:55', 'url': 'http://money.163.com/16/0105/13/BCIPRCDP002551G6.html'}, {'title': '手机厂商频死亡 将大洗牌?', 'time': '2015-12-31 12:14:33', 'url': 'http://money.163.com/15/1231/12/BC5O9GEI002551G6.html'}, {'title': '2015三星与苹果暗战胜负几何?', 'time': '2015-12-29 14:55:41', 'url': 'http://money.163.com/15/1229/14/BC0SN3OC002551G6.html'}, {'title': '宝能作为门口野蛮人是坏人吗', 'time': '2015-12-19 12:31:57', 'url': 'http://money.163.com/15/1219/12/BB6SGNBI002551G6.html'}]

assert extract163Data(soup) == list1

img

你从浏览器开发者工具分析网页内容就行了,bs4的使用可以参考官方文档,有中文而且很全面

你是哪个大学的我已经知道了