使用 Python 从字符串中删除不间断空格
我在 Python 中遇到了一个非常基本的字符串问题(我无法弄清楚).基本上,我正在尝试执行以下操作:
I am having some trouble with a very basic string issue in Python (that I can't figure out). Basically, I am trying to do the following:
'# read file into a string
myString = file.read()
'# Attempt to remove non breaking spaces
myString = myString.replace("\u00A0"," ")
'# however, when I print my string to output to console, I get:
Foo **<C2><A0>** Bar
我认为\u00A0"是unicode非中断空格的转义码,但显然我没有正确执行此操作.关于我做错了什么的任何想法?
I thought that the "\u00A0" was the escape code for unicode non breaking spaces, but apparently I am not doing this properly. Any ideas on what I am doing wrong?
你没有一个 unicode 字符串,而是一个 UTF-8 字节列表(这是 Python 2.x 中的字符串).
You don't have a unicode string, but a UTF-8 list of bytes (which are what strings are in Python 2.x).
试试
myString = myString.replace("\xc2\xa0", " ")
最好切换到 unicode -- 参见 这篇文章 的想法.所以你可以说
Better would be to switch to unicode -- see this article for ideas. Thus you could say
uniString = unicode(myString, "UTF-8")
uniString = uniString.replace(u"\u00A0", " ")
它也应该可以工作(注意:我现在没有可用的 Python 2.x),尽管在将其发送到文件或打印到屏幕时需要将其转换回字节(二进制).
and it should also work (caveat: I don't have Python 2.x available right now), although you will need to translate it back to bytes (binary) when sending it to a file or printing it to a screen.