当一个str的内容为中文的 utf-8 编码的值时，怎么将这个str转换成中文

当一个str的内容为中文的 utf-8 编码的值时，如何将这个str转换成中文？
>>> sd = '山东'
>>> sd
'山东'
>>> sd.encode('utf-8')
b'\xe5\xb1\xb1\xe4\xb8\x9c'
>>>
>>> str = '\xe5\xb1\xb1\xe4\xb8\x9c'
>>> str
'\xe5±±\xe4\xb8\x9c'
>>> print(str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\xe5' in position 0: ill
egal multibyte sequence

如上，当一个str的内容为中文的 utf-8 编码的值时，如何将这个str转换成中文？
即：怎么对str进行操作，可以print出来“山东”。 Python2 可以对str进行decode，但是Python3不行。
------解决思路----------------------
既然encode那么就要decode

print '\xe5\xb1\xb1\xe4\xb8\x9c'.decode('utf-8')
------解决思路----------------------
UnicodeEncodeError: 'gbk' codec can't encode character '\xe5' in position 0: ill
你这个貌似是GBK编码的把这个和UTF8是2回事
建议你先找个能看编码类型的editor 看看输入源到底是什么类型的编码再进行处理
------解决思路----------------------

引用:

用bottle的孩子们应该都会碰到这个问题啊，form传值的时候，如果表单中有中文，那么request.POST.get取到的String就是编码之后的一串东西啊~~ 有木有大神来回答下

Bottle返回的类型应该是byte string，而不是str（字符串'\xe5\xb1\xb1\xe4\xb8\x9c'只是它打印出来的形式），所以可以decode。另外，Bottle的FormDict也支持指定参数编码，把所有内容解码等操作：

class FormsDict(*a, **k)[source]
This MultiDict subclass is used to store request form data. Additionally to the normal dict-like item access methods (which return unmodified data as native strings), this container also supports attribute-like access to its values. Attributes are automatically de- or recoded to match input_encoding (default: ‘utf8’). Missing attributes default to an empty string.

input_encoding = 'utf8'
Encoding used for attribute values.

recode_unicode = True
If true (default), unicode strings are first encoded with latin1 and then decoded to match input_encoding.

decode(encoding=None)[source]
Returns a copy with all keys and values de- or recoded to match input_encoding. Some libraries (e.g. WTForms) want a unicode dictionary.

getunicode(name, default=None, encoding=None)[source]
Return the value as a unicode string, or the default.

贴出你的代码更方便讨论。
------解决思路----------------------

引用:

...
不过对 str = '\xe5\xb1\xb1\xe4\xb8\x9c' 字符串搞出 “山东” 这个问题还在纠结。。。。

理论上，你不应该得到 '\xe5\xb1\xb1\xe4\xb8\x9c' 这样的字符串，所以也不需要转换。但如果一定要转换的话，只要把str转换为byte array就可以用decode了。



In [24]: bytearray(map(ord, str)).decode('utf-8')

Out[24]: '山东'

或者



In [43]: import codecs


In [44]: print(codecs.encode(str, 'raw_unicode_escape').decode('utf-8'))

山东

当一个str的内容为 中文的 utf-8 编码的值时，怎么将这个str转换成中文

相关推荐

当一个str的内容为中文的 utf-8 编码的值时，怎么将这个str转换成中文