使用ElementTree将utf-8数据写入xml utf-8文件
我正在尝试使用ElementTree这样的utf-8编码数据编写xml文件:
I'm trying to write an xml file with utf-8 encoded data using ElementTree like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import codecs
testtag = ET.Element('unicodetag')
testtag.text = u'Töreboda' #The o is really ö (o with two dots over). No idea why SO dont display this
expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
expfile.close()
这会因错误而炸毁
Traceback (most recent call last):
File "unicodetest.py", line 10, in <module>
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 815, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/lib/python2.7/codecs.py", line 691, in write
return self.writer.write(data)
File "/usr/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
使用 us-ascii编码可以正常工作,但不要在数据中保留unicode字符。
Using the "us-ascii" encoding instead works fine, but don't preserve the unicode characters in the data. What is happening?
codecs.open
期望Unicode字符串是什么?写入文件对象,它将处理UTF-8编码。 ElementTree的 write
将Unicode字符串编码为UTF-8字节字符串,然后再将它们发送到文件对象。由于文件对象需要Unicode字符串,因此它将使用默认的 ascii
编解码器将字节字符串强制转换回Unicode,并导致 UnicodeDecodeError
。
codecs.open
expects Unicode strings to be written to the file object and it will handle encoding to UTF-8. ElementTree's write
encodes the Unicode strings to UTF-8 byte strings before sending them to the file object. Since the file object wants Unicode strings, it is coercing the byte string back to Unicode using the default ascii
codec and causing the UnicodeDecodeError
.
只需执行以下操作:
#expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write('testunicode.xml',encoding="UTF-8",xml_declaration=True)
#expfile.close()