使用 Python 从字符串中删除不间断空格

问题描述:

我在 Python 中遇到了一个非常基本的字符串问题(我无法弄清楚).基本上,我正在尝试执行以下操作:

I am having some trouble with a very basic string issue in Python (that I can't figure out). Basically, I am trying to do the following:

'# read file into a string 
myString =  file.read()

'# Attempt to remove non breaking spaces 
myString = myString.replace("\u00A0"," ")

'# however, when I print my string to output to console, I get: 
Foo **<C2><A0>** Bar

我认为\u00A0"是unicode非中断空格的转义码,但显然我没有正确执行此操作.关于我做错了什么的任何想法?

I thought that the "\u00A0" was the escape code for unicode non breaking spaces, but apparently I am not doing this properly. Any ideas on what I am doing wrong?

你没有一个 unicode 字符串,而是一个 UTF-8 字节列表(这是 Python 2.x 中的字符串).

You don't have a unicode string, but a UTF-8 list of bytes (which are what strings are in Python 2.x).

试试

myString = myString.replace("\xc2\xa0", " ")

最好切换到 unicode -- 参见 这篇文章 的想法.所以你可以说

Better would be to switch to unicode -- see this article for ideas. Thus you could say

uniString = unicode(myString, "UTF-8")
uniString = uniString.replace(u"\u00A0", " ")

它也应该可以工作(注意:我现在没有可用的 Python 2.x),尽管在将其发送到文件或打印到屏幕时需要将其转换回字节(二进制).

and it should also work (caveat: I don't have Python 2.x available right now), although you will need to translate it back to bytes (binary) when sending it to a file or printing it to a screen.