如何在Python中编码和解码百分比编码(URL编码)的字符串?
我写了一个简单的应用程序,可以从Wiki页面下载文章。当我搜索名字为 Lech
的代码时,我的代码将返回 Lech_Kaczy%C5%84ski
或 Lech_Pozna%C5%84
而不是Lech_Kaczyński
和Lech_Poznań
。
I wrote a simple application which downloads articles from wiki pages. When I search, for example for a firstname Lech
, my code returns strings like Lech_Kaczy%C5%84ski
or Lech_Pozna%C5%84
instead of Lech_Kaczyński
and Lech_Poznań
.
如何将那些字符解码为普通的波兰字母?我尝试使用: urllib.unquote(text)
,但随后得到了 Lech_Kaczy\xc5\x84ski
, Lech_Pozna\xc5\x84
代替Lech_Kaczyński
和Lech_Poznań
。
How can I decode those characters to ordinary polish letters? I tried to use:
urllib.unquote(text)
but then got Lech_Kaczy\xc5\x84ski
, Lech_Pozna\xc5\x84
instead of Lech_Kaczyński
and Lech_Poznań
.
我的代码是>>
I have in my code:
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
但是结果是相同的(根本行不通)。
But the result is the same (it simply does not work).
尝试以下操作:
import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')
这将返回unicode字符串:
This will return a unicode string:
u'Lech_Kaczy\u0144ski'
您可以照常打印和处理。例如:
which you can then print and process as usual. For example:
print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))
将导致
Lech_Kaczyński