如何使用python从Intranet站点抓取URL数据?

问题描述:

我需要一个 Python Warrior 来帮助我(我是菜鸟)!我正在尝试使用模块 urllib 从内部网站抓取某些数据.但是,由于我公司的网站仅供员工查看而不向公众开放,因此我想这就是我获得此代码的原因:

I need a Python Warrior to help me (I'm a noob)! I'm trying to scrape certain data from an intra-net site using Module urllib. However, since it is my company website that is only available to employees to view and not to the public, I think this is why I get this code:

IOError: ('http 错误', 401, '未授权', )

IOError: ('http error', 401, 'Unauthorized', )

我是怎么解决这个问题的?它甚至不会使用 htmlfile.read()

How do I come about this? It won't even read the site using htmlfile.read()

获取公共站点的示例代码:

Sample code to get public site:

import urllib
import re

htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=AAPL")

htmltext = htmlfile.read()

regex = '<span id="yfs_l84_aapl">(.+?)</span>' 

pattern = re.compile(regex)

price = re.findall(pattern,htmltext)

print price

尝试 requests 使用 requests_ntlm:

import requests
from requests_ntlm import HttpNtlmAuth

r = requests.get("http://ntlm_protected_site.com",auth=HttpNtlmAuth('domain\\username','password'))

    print r.text

如果您需要有关此库的任何细节的帮助并且在文档中找不到它,请发表评论.

If you need help with any specifics of this library and can't find it in the docs, leave a comment.