python http状态码

问题描述:

我正在用python编写自己的目录破坏器,并在安全可靠的环境中针对我的Web服务器对其进行测试.该脚本基本上试图从给定的网站检索公用目录,并查看响应的HTTP状态代码,从而能够确定页面是否可访问.
首先,脚本将读取包含所有要查找的有趣目录的文件,然后以以下方式发出请求:

I'm writing my own directory buster in python, and I'm testing it against a web server of mine in a safe and secure environment. This script basically tries to retrieve common directories from a given website and, looking at the HTTP status code of the response, it is able to determine if a page is accessible or not.
As a start, the script reads a file containing all the interesting directories to be looked up, and then requests are made, in the following way:

for dir in fileinput.input('utils/Directories_Common.wordlist'):

    try:
        conn = httplib.HTTPConnection(url)
        conn.request("GET", "/"+str(dir))
        toturl = 'http://'+url+'/'+str(dir)[:-1]
        print '    Trying to get: '+toturl
        r1 = conn.getresponse()
        response = r1.read()
        print '   ',r1.status, r1.reason
        conn.close()

然后,解析响应,如果返回的状态码等于"200",则可以访问该页面.我已经通过以下方式实现了所有这些:

Then, the response is parsed and if a status code equal to "200" is returned, then the page is accessible. I've implemented all this in the following way:

if(r1.status == 200):
    print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'

对我来说,一切似乎都很好,除了脚本将其标记为实际上不是可访问的页面.实际上,该算法仅收集返回"200 OK"的页面,但是当我手动浏览这些页面时,我发现它们已被永久移动或访问受限.出了点问题,但我无法确定应该在哪里正确修复代码,我们将为您提供任何帮助.

All seems fine to me except that the script marks as accessible pages that actually aren't. In fact, the algorithm collects the only pages that return a "200 OK", but when I manually surf to check those pages I found out they have been moved permanently or they have a restricted access. Something went wrong but I cannot spot where should I fix the code exactly, any help is appreciated..

我没有发现您的代码有任何问题,只是它几乎不可读.我已将其重写为以下工作片段:

I did not found any problems with your code, except it is almost unreadable. I have rewritten it into this working snippet:

import httplib

host = 'www.google.com'
directories = ['aosicdjqwe0cd9qwe0d9q2we', 'reader', 'news']

for directory in directories:
    conn = httplib.HTTPConnection(host)
    conn.request('HEAD', '/' + directory)

    url = 'http://{0}/{1}'.format(host, directory)
    print '    Trying: {0}'.format(url)

    response = conn.getresponse()
    print '    Got: ', response.status, response.reason

    conn.close()

    if response.status == 200:
        print ("[!] The subdirectory '{0}' "
               "could be interesting.").format(directory)

输出:

$ python snippet.py
    Trying: http://www.google.com/aosicdjqwe0cd9qwe0d9q2we
    Got:  404 Not Found
    Trying: http://www.google.com/reader
    Got:  302 Moved Temporarily
    Trying: http://www.google.com/news
    Got:  200 OK
[!] The subdirectory 'news' could be interesting.

此外,我确实使用了 HEAD HTTP请求而不是GET,因为如果您不需要内容,而您只对状态码感兴趣,则效率更高.

Also, I did use HEAD HTTP request instead of GET, as it is more efficient if you do not need the contents and you are interested only in the status code.