使用控制台在Windows XP上以utf8格式打印python进行打印

问题描述:

我想在Windows XP上配置控制台以支持UTF8,并让python检测到并使用它.

I would like to configure my console on Windows XP to support UTF8 and to have python detect that and work with it.

到目前为止,我的尝试:

So far, my attempts:

C:\Documents and Settings\Philippe>C:\Python25\python.exe
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'é'
é
>>> import sys
>>> sys.stdout.encoding
'cp437'
>>> quit()

因此,默认情况下,我在cp437中,而python检测到就很好了.

So, by default I am in cp437 and python detects that just fine.

C:\Documents and Settings\Philippe>chcp 65001
Active code page: 65001

C:\Documents and Settings\Philippe>python
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp65001'
>>> print u'é'
C:\Documents and Settings\Philippe>

似乎用UTF8打印现在使python崩溃了...

It seems like printing in UTF8 makes python crash now...

我想在Windows XP上配置控制台以支持UTF8

I would like to configure my console on Windows XP to support UTF8

我认为这不会发生.

65001代码页有错误;一些stdio调用行为不正确并破坏了许多工具.您可以手动将cp65001注册为编码:

The 65001 code page is buggy; some stdio calls behave incorrectly and break many tools. Whilst you can register cp65001 as an encoding manually:

def cp65001(name):
    if name.lower()=='cp65001':
        return codecs.lookup('utf-8')

codecs.register(cp65001)

,这允许您进入print u'some unicode string',不允许您在该Unicode字符串中写入非ASCII字符.当您尝试将非ASCII UTF-8序列直接作为字节字符串直接写入时,会遇到相同的奇数错误(IOError 0等).

and this allows you to print u'some unicode string', it doesn't allow you to write non-ASCII characters in that Unicode string. You get the same odd errors (IOError 0 et al) that you do when you try to write non-ASCII UTF-8 sequences directly as byte strings.

不幸的是,UTF-8是Windows下的二等公民. NT的Unicode模型是在UTF-8出现之前制定的,因此,您希望在需要一致的Unicode的任何地方使用每代码单元两个字节的编码(UTF-16,最初为UCS-2).像许多用C的stdio编写的便携式应用程序和语言(例如Python)一样,使用字节字符串不适合该模型.

Unfortunately UTF-8 is a second-class citizen under Windows. NT's Unicode model was drawn up before UTF-8 existed and consequently you're expected to use two-byte-per-code-unit encodings (UTF-16, originally UCS-2) anywhere you want consistent Unicode. Using byte strings, like many portable apps and languages (such as Python) written with C's stdio, doesn't fit that model.

然后重写Python以使用Windows Unicode控制台调用(例如WriteConsoleW)而不是可移植的C stdio调用,不能很好地与管道和重定向到文件之类的shell技巧一起使用. (更不用说,您仍然必须从默认的终端字体更改为TTF字体,然后才能看到所有结果……)

And rewriting Python to use the Windows Unicode console calls (like WriteConsoleW) instead of the portable C stdio ones doesn't play well with shell tricks like piping and redirecting to a file. (Not to mention that you still have to change from the default terminal font to a TTF one before you can see the results working at all...)

最终,如果您需要一个命令行,并且该命令行对基于stdio的应用程序具有有效的UTF-8支持,那么最好使用故意支持Windows的Windows控制台替代品,例如Cygwin,Python的IDLE或pywin32. PythonWin.

Ultimately if you need a command line with working UTF-8 support for stdio-based apps, you'd probably be better off using an alternative to the Windows Console that deliberately supports it, such as Cygwin's, or Python's IDLE or pywin32's PythonWin.