NLTK:设置代理服务器

问题描述:

我正在尝试学习 NLTK -用Python编写的自然语言工具包,我想安装一个示例数据集来运行一些示例.

I'm trying to learn NLTK - Natural Language Toolkit written in Python and I want install a sample data set to run some examples.

我的网络连接使用代理服务器,并且我尝试按以下方式指定代理地址:

My web connection uses a proxy server, and I'm trying to specify the proxy address as follows:

>>> nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))
>>> nltk.download()

但是我得到一个错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable

我决定在呼叫nltk.download()之前先设置ProxyBasicAuthHandler:

I decided to set up a ProxyBasicAuthHandler before calling nltk.download():

import urllib2

auth_handler = urllib2.ProxyBasicAuthHandler(urllib2.HTTPPasswordMgrWithDefaultRealm())
auth_handler.add_password(realm=None, uri='http://proxy.example.com:3128/', user='USERNAME', passwd='PASSWORD')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)

import nltk
nltk.download()

但是现在我得到了HTTP Error 407 - Proxy Autentification Required.

文档> 指出,如果将代理设置为None,则此功能将尝试检测系统代理.但这不起作用.

The documentation says that if the proxy is set to None then this function will attempt to detect the system proxy. But it isn't working.

如何为NLTK安装样本数据集?

How can I install a sample data set for NLTK?

网站出现错误,您在第一次尝试中获得了这些代码行(我见过同样的错误)

There is an error with the website where you got those lines of code for your first attempt (I have seen that same error)

错误的行是

nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))

您需要使用逗号分隔参数.正确的行应该是

You need a comma to separate the arguments. The correct line should be

nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))

这将很好地工作.