如何将所有这些行写入给定范围的CSV文件?
下面的代码的目的是webscrape牛津英语词典在一年的范围内每年发明的词。这一切都按预期。
The purpose of the below code is the webscrape the oxford english dictionary for words that were "invented" in each year within a range of years. This all works as intended.
import csv
import os
import re
import requests
import urllib2
year_start= 1550
year_end = 1552
subject_search = ['Law']
for year in range(year_start, year_end +1):
path = '/Applications/Python 3.5/Economic'
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
header = {'User-Agent':user_agent}
resultPath = os.path.join(path, 'OED_table.csv')
htmlPath = os.path.join(path, 'OED.html')
request = urllib2.Request('http://www.oed.com/search?browseType=sortAlpha&case-insensitive=true&dateFilter='+ str(year)+ '&nearDistance=1&ordered=false&page=1&pageSize=100&scope=ENTRY&sort=entry&subjectClass='+ str(subject_search)+ '&type=dictionarysearch', None, header)
page = opener.open(request)
with open(resultPath, 'wb') as outputw, open(htmlPath, 'w') as outputh:
urlpage = page.read()
outputh.write(urlpage)
new_words = re.findall(r'<span class=\"hwSect\"><span class=\"hw\">(.*?)</span>', urlpage)
print new_words
csv_writer = csv.writer(outputw)
if csv_writer.writerow([year] + new_words):
csv_writer.writerow([year, word])
但是,当我实际运行代码时,写入csv文件的唯一部分是我调用的最后一年。所以,我的csv文件最终看起来像一行像这样:
However, when I actually run the code, the only portion that gets written to the csv file is the very last year that I call. So, my csv file ends up looking like a one row like this:
1552,word1,word2,word3等....
1552, word1, word2, word3, etc....
我基本上希望在年份范围内每年有一个单独的行。
I basically want to have a separate row for each year in the range of years. How do I go about this?
你会在循环中覆盖,每次运行代码时,循环和添加到 a
而不是 w
代码将添加到现有的数据不覆盖。
You keep overwriting in the loop and every time you run the code, open it once outside the loops and append to the file opening with a
instead of w
so each run of the code will add to the existing data not overwrite.:
with open("/Applications/Python 3.5/Economic/OED_table.csv", 'a') as outputw, open("/Applications/Python 3.5/Economic/OED.html", 'a') as outputh:
for year in range(year_start, year_end +1):
.....................