使用python请求模块在Github中创建经过身份验证的会话

问题描述:

我的目标是在github中创建一个经过身份验证的会话,因此我可以使用高级搜索(这将功能限制于未经身份验证的用户).目前,我从什么?您的浏览器发生了意外的事情.如果问题仍然存在,请与我们联系"的帖子请求中得到网页响应.

My goal to create an authenticated session in github so I can use the advanced search (which limits functionality to non-authenticated users). Currently I am getting a webpage response from the post request of "What? Your browser did something unexpected. Please contact us if the problem persists."

这是我用来尝试完成任务的代码.

Here is the code I am using to try to accomplish my task.

import requests
from lxml import html

s = requests.Session()
payload = (username, password)
_ = s.get('https://www.github.com/login')
p = s.post('https://www.github.com/login', auth=payload)

url = "https://github.com/search?l=&p=0&q=language%3APython+extension%3A.py+sklearn&ref=advsearch&type=Code"
r = s.get(url, auth=payload)
text = r.text
tree = html.fromstring(text)

我正在尝试的可能吗?我宁愿不使用github v3 api,因为它的速率受到限制,并且我想对高级搜索做更多自己的抓取工作.谢谢.

Is what I'm trying possible? I would prefer to not use the github v3 api since it is rate limited and I wanted to do more of my own scraping of the advanced search. Thanks.

如评论中所述,github使用发布数据进行身份验证,因此您应该在

As mentioned in the comments, github uses post data for authentication so you should have your creds in the data parameter.
The elements you have to submit are 'login', 'password', and 'authenticity_token'. The value of 'authenticity_token' is dynamic, but you can scrape it from '/login'.
Finally submit data to /session and you should have an authenticated session.

s = requests.Session()
r = s.get('https://www.github.com/login')
tree = html.fromstring(r.content)
data = {i.get('name'):i.get('value') for i in tree.cssselect('input')}
data['login'] = username
data['password'] = password
r = s.post('https://github.com/session', data=data)