如何在Python中编码和解码百分比编码(URL编码)的字符串?

如何在Python中编码和解码百分比编码(URL编码)的字符串?

问题描述:

我写了一个简单的应用程序,可以从Wiki页面下载文章。当我搜索名字为 Lech 的代码时,我的代码将返回 Lech_Kaczy%C5%84ski Lech_Pozna%C5%84 而不是Lech_KaczyńskiLech_Poznań

I wrote a simple application which downloads articles from wiki pages. When I search, for example for a firstname Lech, my code returns strings like Lech_Kaczy%C5%84ski or Lech_Pozna%C5%84 instead of Lech_Kaczyński and Lech_Poznań.

如何将那些字符解码为普通的波兰字母?我尝试使用:
urllib.unquote(text),但随后得到了 Lech_Kaczy\xc5\x84ski Lech_Pozna\xc5\x84 代替Lech_KaczyńskiLech_Poznań

How can I decode those characters to ordinary polish letters? I tried to use: urllib.unquote(text) but then got Lech_Kaczy\xc5\x84ski, Lech_Pozna\xc5\x84 instead of Lech_Kaczyński and Lech_Poznań.

我的代码是>>

I have in my code:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

但是结果是相同的(根本行不通)。

But the result is the same (it simply does not work).

尝试以下操作:

import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')

这将返回unicode字符串:

This will return a unicode string:

u'Lech_Kaczy\u0144ski'

您可以照常打印和处理。例如:

which you can then print and process as usual. For example:

print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))

将导致

Lech_Kaczyński