新人求教个有关问题

新人求教个问题
有一个test.txt文本文件：
如下：（字+制表符（Tab）+编码）
工 a
了 b
以 c
自然 thqd
* thmh
结束 xfgk

要求很简单。
输出两字词的词条：
也就是：
自然 thqd
* thmh
结束 xfgk

以下是代码：



#!/usr/bin/env python

#-*-coding:utf-8-*-




f = open(u'test.txt','r')

for i in f:

         array=[]

         array=i.split()

     

         str=array[0].decode('utf-8')

         str2=array[1].decode('utf-8')

       

      

         if len(str)==2:

             print str+'\t'+str2

问题：test.txt要保存为UTF-8不写BOM格式这个代码才能正常输出结果，否则的话会输出
工 a
自然 thqd
* thmh
结束 xfgk

也就是第一行有问题。求各位大神给个好的方法。
无需对text.txt有编码要求。
------解决思路----------------------




# -*- coding: cp936 -*-

import codecs

f = open(u'test.txt','r')

for i in f:

         array=[]

         #你在这里切掉BOM不就好了么

         line = i

         if line[:3] == codecs.BOM_UTF8:

             line = line[3:]

         array=line.split()

      

         str=array[0].decode('utf-8')

         str2=array[1].decode('utf-8')

        

         if len(str)==2:

             print str+'\t'+str2



Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32

Type "copyright", "credits" or "license()" for more information.

>>> ================================ RESTART ================================

>>> 

自然	thqd

*	thmh

结束	xfgk

>>>

新人求教个有关问题

相关推荐