python基础教程札记-项目1-即时标记-Day2
昨天主要了解了下生成器,用文档测下lines:
def lines(file): for line in file: yield line yield '\n' for i in lines(sys.stdin): if i: print i print '---'
测试文档test_input.txt:
hello how are you how do you do fine
执行:
输出结果test_output.txt:
hello --- --- how are you --- how do you do --- --- fine --- ---
这里可以注意到hello后面有个换行,然后再是“---”。个人认为原因如下:
首先,test_input.txt实际上是一个list:
test_input = ['hello\n','\n','how areyou\n','how do you do\n','\n','fine']
其次,print打印出的东西自带换行效果:
print ‘1’ print ‘2’
执行效果为:
即先打印出1,然后换行,再打印2,再换行,最后执行结束。
在test_output.txt中也是这样:
先打印’hello\n’,然后换行,然后打印‘---’,再打印’\n’,再换行,再打印‘---’。。。
接下来看生成器blocks:
def blocks(file): block = [] for line in lines(file): if line.strip(): block.append(line) elif block: yield ''.join(block).strip() block = [] test_input = ['hello\n','\n','how are you\n','how do you do\n','\n','fine'] for i in blocks(test_input): if i: print i print '---'
执行结果:
strip()的功能为删除字符串中的’\n’等空白字符(只删除首尾的!!!,中间的不删,比如’\nhello\nhello’.strip(),返回的结果为’hello\nhello),并返回结果。append为再之后添加,’’.join的意思是将block中的各元素用’’连接起来,返回连接后的字符串.
执行流程:
首先是line = ’hello\n’,lines.strip()为True,经过if后,block的值为’hello\n’。之后line = ’\n’,if中的line.strip()返回的是False,进入elif,block的值是’hello\n’,返回hello并置空block。之后line = ‘how are you\n’,if中判断为True,block为’how are you\n’,再然后line = ‘how do you do\n’,if中判断仍为True,此时block为:[‘how are you\n’,’how do youdo\n’],再之后line = ‘\n’,if中判断为False,进入elif,’’.join(block)执行后返回的值为’how are you\nhow do youdo\n’,执行strip()后返回’how are you\nhow do you do’(这里要注意,strip()只删除字符串首尾的空白字符,不会删除字符串中间的):
综上,
test_input = ['hello\n','\n','how areyou\n','how do you do\n','\n','fine'] 经过生成器blocks后,生成的结果应该是:
[‘hello’,how are you\nhow do you do’,’fine’]
也就是将输入的文本返回块
利用blocks生成器就可以做一些简单的工作了:
util.py:
import sys, re def lines(file): for line in file: yield line yield '\n' def blocks(file): block = [] for line in lines(file): if line.strip(): block.append(line) elif block: yield ''.join(block).strip() block = [] print '<html><head><title>...</title><body>' title = True for block in blocks(sys.stdin): block = re.sub(r'\*(.+?)\*', r'<em>\1</em>', block) if title: print '<h1>' print block print '</h1>' title = False else: print '<p>' print block print '</p>' print '</body></html>'
其中re.sub是正则表达式,将*XXX*替换为:<em>XXX</em>
正则表达式讲解可见:http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
执行:test_input.txt:
Welcome to World Wide Spam, Inc. These are the corporate web pages of *World Wide Spam*, Inc. We hope you find your stay enjoyable, and that you will sample many of our products. A short history of the company World Wide Spam was started in the summer of 2000. The business concept was to ride the dot-com wave and to make money both through bulk email and by selling canned meat online. After receiving several complaints from customers who weren't satisfied by their bulk email, World Wide Spam altered their profile, and focused 100% on canned goods. Today, they rank as the world's 13,892nd online supplier of SPAM. Destinations From this page you may visit several of our interesting web pages: - What is SPAM? (http://wwspam.fu/whatisspam) - How do they make it? (http://wwspam.fu/howtomakeit) - Why should I eat it? (http://wwspam.fu/whyeatit) How to get in touch with us You can get in touch with us in *many* ways: By phone (555-1234), by email (wwspam@wwspam.fu) or by visiting our customer feedback page (http://wwspam.fu/feedback).执行结果out.html:
<html><head><title>...</title><body> <h1> Welcome to World Wide Spam, Inc. </h1> <p> These are the corporate web pages of <em>World Wide Spam</em>, Inc. We hope you find your stay enjoyable, and that you will sample many of our products. </p> <p> A short history of the company </p> <p> World Wide Spam was started in the summer of 2000. The business concept was to ride the dot-com wave and to make money both through bulk email and by selling canned meat online. </p> <p> After receiving several complaints from customers who weren't satisfied by their bulk email, World Wide Spam altered their profile, and focused 100% on canned goods. Today, they rank as the world's 13,892nd online supplier of SPAM. </p> <p> Destinations </p> <p> From this page you may visit several of our interesting web pages: </p> <p> - What is SPAM? (http://wwspam.fu/whatisspam) </p> <p> - How do they make it? (http://wwspam.fu/howtomakeit) </p> <p> - Why should I eat it? (http://wwspam.fu/whyeatit) </p> <p> How to get in touch with us </p> <p> You can get in touch with us in <em>many</em> ways: By phone (555-1234), by email (wwspam@wwspam.fu) or by visiting our customer feedback page (http://wwspam.fu/feedback). </p> </body></html>