python http状态码
我正在用python编写自己的目录破坏器,并在安全可靠的环境中针对我的Web服务器对其进行测试.该脚本基本上试图从给定的网站检索公用目录,并查看响应的HTTP状态代码,从而能够确定页面是否可访问.
首先,脚本将读取包含所有要查找的有趣目录的文件,然后以以下方式发出请求:
I'm writing my own directory buster in python, and I'm testing it against a web server of mine in a safe and secure environment. This script basically tries to retrieve common directories from a given website and, looking at the HTTP status code of the response, it is able to determine if a page is accessible or not.
As a start, the script reads a file containing all the interesting directories to be looked up, and then requests are made, in the following way:
for dir in fileinput.input('utils/Directories_Common.wordlist'):
try:
conn = httplib.HTTPConnection(url)
conn.request("GET", "/"+str(dir))
toturl = 'http://'+url+'/'+str(dir)[:-1]
print ' Trying to get: '+toturl
r1 = conn.getresponse()
response = r1.read()
print ' ',r1.status, r1.reason
conn.close()
然后,解析响应,如果返回的状态码等于"200",则可以访问该页面.我已经通过以下方式实现了所有这些:
Then, the response is parsed and if a status code equal to "200" is returned, then the page is accessible. I've implemented all this in the following way:
if(r1.status == 200):
print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'
对我来说,一切似乎都很好,除了脚本将其标记为实际上不是可访问的页面.实际上,该算法仅收集返回"200 OK"的页面,但是当我手动浏览这些页面时,我发现它们已被永久移动或访问受限.出了点问题,但我无法确定应该在哪里正确修复代码,我们将为您提供任何帮助.
All seems fine to me except that the script marks as accessible pages that actually aren't. In fact, the algorithm collects the only pages that return a "200 OK", but when I manually surf to check those pages I found out they have been moved permanently or they have a restricted access. Something went wrong but I cannot spot where should I fix the code exactly, any help is appreciated..
我没有发现您的代码有任何问题,只是它几乎不可读.我已将其重写为以下工作片段:
I did not found any problems with your code, except it is almost unreadable. I have rewritten it into this working snippet:
import httplib
host = 'www.google.com'
directories = ['aosicdjqwe0cd9qwe0d9q2we', 'reader', 'news']
for directory in directories:
conn = httplib.HTTPConnection(host)
conn.request('HEAD', '/' + directory)
url = 'http://{0}/{1}'.format(host, directory)
print ' Trying: {0}'.format(url)
response = conn.getresponse()
print ' Got: ', response.status, response.reason
conn.close()
if response.status == 200:
print ("[!] The subdirectory '{0}' "
"could be interesting.").format(directory)
输出:
$ python snippet.py
Trying: http://www.google.com/aosicdjqwe0cd9qwe0d9q2we
Got: 404 Not Found
Trying: http://www.google.com/reader
Got: 302 Moved Temporarily
Trying: http://www.google.com/news
Got: 200 OK
[!] The subdirectory 'news' could be interesting.
此外,我确实使用了 HEAD HTTP请求而不是GET,因为如果您不需要内容,而您只对状态码感兴趣,则效率更高.
Also, I did use HEAD HTTP request instead of GET, as it is more efficient if you do not need the contents and you are interested only in the status code.