在Python中使用UTF-8
由于现在是夏天,所以我决定学习一种新语言,而Python是我的选择.确实,我想学习的是如何使用Python处理阿拉伯文字.现在,我发现了许多使用Python的资源,这些资源确实很棒.但是,当我将所学的知识应用到阿拉伯字符串时,我得到的数字和字母结合在一起.
As it is summer now, I decided to learn a new language and Python was my choice. Really, what I would like to learn is how to manipulate Arabic text using Python. Now, I have found many many resources on using Python, which are really great. However, when I apply what I learned on Arabic strings, I get numbers and letters combined together.
以英语为例
>>> ebook = 'The American English Dictionary'
>>> ebook[2]
'e'
现在,对于阿拉伯语:
>>> abook = 'القاموس العربي'
>>> abook[2]
'\xde' #the correct output should be 'ق'
但是,使用print
可以正常工作,如下所示:
However, using print
works fine, as in:
>>> print abook[2]
ق
我需要修改什么才能使Python始终识别阿拉伯字母?
What do I need to modify to get Python to always recognize Arabic letters?
显式使用Unicode:
Use Unicode explicitly:
>>> s = u'القاموس العربي'
>>> s
u'\u0627\u0644\u0642\u0627\u0645\u0648\u0633 \u0627\u0644\u0639\u0631\u0628\u064a'
>>> print s
القاموس العربي
>>> print s[2]
ق
甚至一个字符一个字符
>>> for i, c in enumerate(s):
... print i,c
...
0 ا
1 ل
2 ق
3 ا
4 م
5 و
6 س
7
8 ا
9 ل
10 ع
11 ر
12 ب
13 ي
14
我推荐 Python Unicode页面,该页面简短,实用且有用.
I recommend the Python Unicode page which is short, practical and useful.