自动图片下载的网站身份验证，第二部分

问题描述：

此问题是由这个其他问题推导：Automate从网站上下载的图片与认证在那里我问如何从一个特定的网站，需要登录下载图片。

This question is a derivated from this other question: Automate picture downloads from website with authentication where I asked how to download a picture from an specific website that requires a login.

有来自同一家公司， cgwallpapers.com 和的 gamewallpapers.com ，因为与谁回答其他的问题，我终于maneged如何将网站之一的自动下载用户的帮助下，我不能够再现网站 gamewallpapers.com相同的步骤。

There are two websites from the same company, cgwallpapers.com and gamewallpapers.com, since with the help of the user who answered the other question I finally maneged how to automate the download of one of the websites, I'm not able to reproduce the same steps on gamewallpapers.com website.

也许我有错，我会说的事情，由于我没有经验上的要求，所以请如果助手/专家有时间我真的建议，以验证我要的parametters和其他的东西说就像我说的，或都没有，就像我说的，我可能是错的。

Maybe I can be wrong with the things that I'm gonna say due to my inexperience on requests, so please if an helper/expert have time to I really suggest to verify that the parametters and other things that I'm gonna say are like I'm saying or are not, as I say, I can be wrong.

在cgwallpapers.com，我基本上是这样设置的查询，下载壁纸：

In cgwallpapers.com, I basically set the query like this to download a wallpaper:

的http://www.cgmewallpapers.com/members/getwallpaper.php?id=100&res=1920x1080

不过，我发现，在gamewallpapers.com我不能使用同一职位数据，因为它似乎是这样的：

But I found that in gamewallpapers.com I cannot use the same post data because it seems to be like this:

http://www.gamewallpapers.com/members/getwallpaper.php?wallpaper=wallpaper_ancient_space_01_1920x1080.jpg&keystr=1423106012&retry=

在cgwallpapers更容易，因为我正好可以利用增量的for循环与具体的壁纸分辨率的ID，但gamewallpapers.com网站，我想不通我怎么可以自动壁纸下载，它似乎需要一个治疗如果我没有错的完全不同。

In cgwallpapers is easier because I Just can use an incremental for loop with the ids with the specific wallpaper resolution, but with gamewallpapers.com site I can't figure out how I can automate the wallpaper downloads, it seems to need a treatment totally different if I'm not wrong.

所以，我不知道该怎么去尝试，甚至如何做到这一点。

So, I don't know what to try or even how to do it.

在我登录到gamewallpapers.com，这是我试图下载壁纸的方式，当然这不工作，因为我没有使用正确的查询，但该code就职于cgwallpaper .COM网站，所以我会告诉它是否能帮助的东西：

After I logged into gamewallpapers.com, this is the way that I'm trying to download a wallpaper, of course this does not works because I'm not using the proper query, but this code worked for cgwallpaper.com site so i'll show if it can help for something:

注： WallpaperInfo 是我用回下载墙纸图像流的非相关的对象，这是很code，所以我跳过了

NOTE: WallpaperInfo is a non-relevant object that I use to return the downloaded wallpaper image stream, it is much code so I skipped it.

''' <summary>
''' Tries to download the specified wallpaper from GameWallpapers server.
''' </summary>
''' <param name="id">The wallpaper id.</param>
''' <param name="res">The wallpaper resolution.</param>
''' <param name="cookieCollection">The cookie collection.</param>
''' <returns>A <see cref="WallpaperInfo"/> instance containing the wallpaper info and the image stream.</returns>
Private Function GetWallpaperMethod(ByVal id As String,
                                    ByVal res As String,
                                    ByRef cookieCollection As CookieCollection) As WallpaperInfo

    Dim request As HttpWebRequest
    Dim url As String = String.Format("http://www.gamewallpapers.com/members/getwallpaper.php?id={0}&res={1}", id, res)
    Dim contentDisposition As String
    Dim webResponse As WebResponse = Nothing
    Dim responseStream As Stream = Nothing
    Dim imageStream As MemoryStream = Nothing
    Dim wallInfo As WallpaperInfo = Nothing

    Try
        request = DirectCast(HttpWebRequest.Create(url), HttpWebRequest)
        With request
            .Method = "GET"
            .Headers.Add("Accept-Language", "en-US,en;q=0.5")
            .Headers.Add("Accept-Encoding", "gzip, deflate")
            .Headers.Add("Keep-Alive", "300")
            .Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            .AllowAutoRedirect = False
            .UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
            .KeepAlive = True
        End With

        If cookieCollection IsNot Nothing Then
            ' Pass cookie info so that we remain logged in.
            request.CookieContainer = Me.SetCookieContainer(url, cookieCollection)
        End If

        webResponse = request.GetResponse

        Using webResponse

            contentDisposition = CType(webResponse, HttpWebResponse).Headers("Content-Disposition")

            If Not String.IsNullOrEmpty(contentDisposition) Then ' There is an image to download.

                Dim filename As String = contentDisposition.Substring(contentDisposition.IndexOf("=") + "=".Length).
                                         TrimStart(" "c).TrimEnd({" "c, ";"c})

                Try
                    imageStream = New MemoryStream
                    responseStream = webResponse.GetResponseStream

                    Using responseStream

                        Dim buffer(2047) As Byte
                        Dim read As Integer

                        Do
                            read = responseStream.Read(buffer, 0, buffer.Length)
                            imageStream.Write(buffer, 0, read)
                        Loop Until read = 0

                        responseStream.Close()

                    End Using

                Catch ex As Exception
                    Throw

                End Try

                ' This is the object that I'll return
                ' that I'm storing the url, the wallpaper id,
                ' the wallpaper resolution, the wallpaper filename
                ' and finally the downloaded MemoryStream (the wallpaper image stream)
                wallInfo = New WallpaperInfo(url:=url,
                                             id:=id,
                                             resolution:=res,
                                             filename:=filename,
                                             imageStream:=imageStream)

            End If ' String.IsNullOrEmpty(contentDisposition)

        End Using ' webResponse

    Catch ex As Exception
        Throw

    Finally
        If webResponse IsNot Nothing Then
            webResponse.Close()
        End If
        If responseStream IsNot Nothing Then
            responseStream.Close()
        End If

    End Try

    Return wallInfo

End Function

Private Function SetCookieContainer(ByVal url As String,
                                    ByVal cookieCollection As CookieCollection) As CookieContainer

    Dim cookieContainer As New CookieContainer
    Dim refDate As Date

    For Each oldCookie As Cookie In cookieCollection

        If Not DateTime.TryParse(oldCookie.Value, refDate) Then

            Dim newCookie As New Cookie
            With newCookie
                .Name = oldCookie.Name
                .Value = oldCookie.Value
                .Domain = New Uri(url).Host
                .Secure = False
            End With

            cookieContainer.Add(newCookie)

        End If

    Next oldCookie

    Return cookieContainer

End Function

下面改变从基础URL名字的时候是，我试图实现与我没有料到，它应该工作的一个例子使用的全部源（for循环递增壁纸IDS自动下载），它可以完美运行 gamewallpapers.com 到 cgwallpapers.com ，因为这个来源只适用于 cgwallpapers.com ，但我只是想用 gamewallpapers.com 网址：

Here is the full source that I'm trying to realize with an example usage of how I expected that it should work (a for loop incrementing the wallpapers ids to automate downloads ), it works perfect when CHANGING the base url name from gamewallpapers.com to cgwallpapers.com, because this source only works for cgwallpapers.com but I'm just trying it with gamewallpapers.com url:

http://pastebin.com/eyBxHmnJ

答

更新：

作为承诺，我已经使用的测试的Telerik框架。

As promised, I have come up with a "proper" solution to your question for gamewallpapers.com using the Telerik Testing Framework.

您必须更改 sUsername 和 spassword开头变量，以自己的用户名/密码成功登录到该网站。

You must change the sUsername and sPassword variables to your own username/password to successfully log into the site.

这可能要改变可选变量：

Optional variables that you may want to change:

sResolutionString ：默认为1920×1080这是你在你原来的问题指定。此值更改为任何在网站上允许的分辨率值。只是，我不是不是100％肯定，如果所有的图像具有相同的决议，以便改变这个值可能会导致某些图像被跳过，如果他们不具备所需的分辨率的图像警告。
sDownloadPath ：目前设置为同一个文件夹中的应用程序的EXE。更改为要下载图像的路径。
sUserAgent ：默认为用户代理的Internet Explorer 11为Windows 7自测试的Telerik控制框架一个真正的浏览器（IE浏览器的任何版本已经安装在你的电脑在这种情况下），它发送请求时使用的真正的用户代理。此变量的用户代理字符串使用下载时壁纸的HttpWebRequest 键，默认是最有可能不必要，因为包括code将捕捉使用的Telerik用户代理和保存仅用于供以后使用。
nMaxSkippedFilesInSuccession ：设置为默认10。当试图下载壁纸图像，应用程序会检查文件名已经存在于您的下载目录。如果存在，那么该文件将不被下载和跳过计数器将递增。如果跳过计数器达到 nMaxSkippedFilesInSuccession 的值，那么该应用程序将停止，因为它假设你已经在previous会下载的文件的其余部分。 注意：理论上这个数值甚至可以设置为1或2作为文件名是非常独特的，因此永远不会重叠。问题是， toplist.php 网页的日期，如果运行这个程序排序在你们中间，他们上增加X新的图像，然后当你去到下一个页面图像将用X移位。如果x大于 nMaxSkippedFilesInSuccession ，那么你很可能会发现该应用程序将结束prematurely，你会尝试再次因为下载了一些相同的图像过移位。
nCurrentPageID ：设置为缺省值为0。该列表页面 toplist.php 接受查询字符串参数调用启动它告诉从依赖于启动其索引页面您选择的搜索参数。该列表显示每页24幅图像，让 nCurrentPageID 变量必须能整除24，否则你可能最终会跳过图像。根据时间和情况下，你可能无法下载所有的图像在一个会话。如果是这样的话，你可以记住其中 nCurrentPageID 你离开了，并相应地更新这个变量下一次启动不同的ID（记住，图像可能会转移作为新的壁纸被添加到网站，因为该列表页面由墙纸日期排序）。

sResolutionString: Defaults to 1920x1080 which is what you specified in your original question. Change this value to any of the accepted resolution values on the website. Just a warning that I am not not 100% sure if all images have the same resolutions so changing this value may cause some images to be skipped if they do not have an image in the desired resolution.
sDownloadPath: Currently set to the same folder as the application exe. Change this to the path where you want to download your images.
sUserAgent: Defaults to the user agent for Internet Explorer 11 for Windows 7. Since the Telerik Testing Framework controls a real browser (whatever IE version you have installed on your pc in this case), it uses the "real" user agent when sending requests. This variable user agent string is only used when downloading wallpapers using HttpWebRequest and the default is most likely unnecessary since the included code will capture the user agent used by Telerik and save it for later use.
nMaxSkippedFilesInSuccession: Set to 10 by default. When trying to download a wallpaper image, the app will check if the filename already exists in your download directory. If it exists then the file will not be downloaded and a skip counter will be incremented. If the skip counter reaches the value of nMaxSkippedFilesInSuccession then the app stops as it assumes you have downloaded the rest of the files in a previous session. Note: In theory this value could even be set to 1 or 2 as the filenames are very unique and therefore would never overlap. The problem is that the toplist.php page is sorted by date and if in the middle of you running this app they add x new images then when you go to the next page the images will be shifted by x. If x is greater than nMaxSkippedFilesInSuccession then you will most likely find that the app will end prematurely as you will be trying to download a number of the same images over again because of the shift.
nCurrentPageID: Set to 0 by default. The list page toplist.php accepts a query string argument called Start which tells the page which index to start from depending on your chosen search arguments. The list shows 24 images per page so the nCurrentPageID variable must be divisible by 24 or else you may end up skipping images. Depending on time and circumstances you may not be able to download all images in one session. If this is the case you can remember which nCurrentPageIDyou left off on and update this variable accordingly to start on a different id next time (keep in mind that the images may get shifted as new wallpapers are added to the site since the list page is sorted by wallpaper date).

要使用的Telerik测试框架你只需要安装的安装文件，然后包括提及 ArtOfTest.WebAii.dll 。

To use the Telerik Testing Framework you only need to install the setup file and then include a reference to ArtOfTest.WebAii.dll.

一个怪癖有关使用测试框架（至少与Internet Explorer）是，它不允许你启动浏览器作为一个隐藏的进程。我曾接触过这个的Telerik支持，他们声称这是不可能这样做虽然像华廷其他Web刮框架都支持此功能（我个人还是preFER华廷，这和其他的原因，但是它是相当老了和自2011年以来未更新）。因为它是好的，不使用你的电脑打扰你在后台运行网页抓取任务，这个例子开始最小化浏览器（它确实的Telerik支持），然后使用Windows API调用来隐藏浏览器进程。这是一个黑客位，但它是有用的，在我的经验效果很好。

One quirk about using the testing framework (at least with internet explorer) is that it doesn't allow you to start the browser as a hidden process. I have talked to telerik support about this and they claim that it is not possible to do although other web scraping frameworks like Watin do support this feature (I personally still prefer Watin for this and other reasons but it is quite old now and not updated since 2011). Since it is nice to run web scraping tasks in the background without bothering you from using your computer, this example starts the browser minimized (which telerik does support) and then uses windows api calls to hide the browser process. This is a bit of a hack but it is useful and works well in my experience.

在我原来的答案，我提到你很可能会通过点击链接和构建下载URL抓取 toplist.php 页面，但我能得到这个没有点击进入比 toplist.php 之外的任何网页的工作。这是唯一可能的，因为壁纸的文件名（这基本上是，你需要下载一个带有ID）的部分包含在preVIEW图像。我还原本以为 keystr 查询字符串参数是某种标识，即受保护的下载，但它实际上根本不需要拿到墙纸。

In my original answer I mentioned that you would most likely have to crawl the toplist.php page by clicking links and building the download url but I was able to get this to work without clicking into any pages other than toplist.php. This is only possible because the wallpaper filename (which is basically the id that you need to download with) is partially contained in the preview image. I also originally thought that the keystr query string parameter was some kind of id that "protected" the download but it is actually not required at all to get the wallpaper.

最后要提的是， toplist.php 页可以按评级或日期进行排序。等级是很不稳定的，可能在任何时刻改变人们投票支持图像，所以这是不是一个很好的排序方法，这种类型的工作。我们使用的时间在此情况下，因为它可以很好地用于分选并如前应始终以相同的顺序的图像，但有一个小问题：它似乎没有允许以相反的顺序进行排序。因此，最新的图像总是出现在第一页的顶部。这会导致图像转向了在列表中很可能会导致你重新测试相同的图像一遍又一遍，当这种情况发生。对于cgwallpapers.com这不是一个问题，因为新的图像将获得一个新的（更高）的值id，我们可以只记得我们离开的最后一个ID和测试陆续下一个ID，看是否有新的图像。对于gamewallpapers.com我们始终重新运行的pageid 0和继续下去，直到我们达到一定数量跳过的文件知道什么时候，我们发现自去年下载图像的结束。

One last thing to mention is that the toplist.php page can be sorted by rating or date. Rating is very volatile and subject to change at any moment as people vote for images so this is not a good sort method for this type of work. We use the date in this case because it works well for sorting and should always have the images in the same order as before but there is a small issue: It doesn't seem to allow you to sort in the reverse order. Therefore the newest images always appear at the top on the first page. This causes images to shift over in the list and will most likely cause you to re-test the same images over again when this happens. For cgwallpapers.com this is not a problem because new images will receive a new (higher) id value and we can just remember the last id that we left off on and test the next id in succession to see if there are new images. For gamewallpapers.com we always re-run from pageid 0 and keep going until we reach a certain number of skipped files to know when we have found the end of the images since last download.

下面是code。让我知道，如果你有任何问题：

Here is the code. Let me know if you have questions:

Imports ArtOfTest.WebAii.Core
Imports System.Runtime.InteropServices

Public Class Form1
    Const sUsername As String = "USERNAMEHERE"
    Const sPassword As String = "PASSWORDHERE"
    Const sMainURL As String = "http://www.gamewallpapers.com"
    Const sListURL As String = "http://www.gamewallpapers.com/members/toplist.php"
    Const sListQueryString As String = "?action=go&title=&maxage=0&latestnr=0&platform=&resolution=&cyberbabes=&membersonly2=&rating=0&minimumvotes2=0&sort=date&start="
    Const sDownloadURL As String = "http://www.gamewallpapers.com/members/getwallpaper.php?wallpaper="
    Const sResolutionString As String = "1920x1080"
    Private sDownloadPath As String = Application.StartupPath
    Private sUserAgent As String = "Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0;  rv:11.0) like Gecko"    ' Default to ie11 user agent
    Private oCookieContainerObject As New System.Net.CookieContainer
    Private nMaxSkippedFilesInSuccession As Int32 = 10
    Private nCurrentPageID As Int32 = 0 ' Only incrememnt this value in values of 24 or else you may miss some images

    Private Enum oDownloadResult
        Failed = 0
        Success = 1
        Skipped = 2
    End Enum

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        StartScrape()
    End Sub

    Private Sub StartScrape()
        Dim oBrowser As Manager = Nothing

        Try
            ' Start Internt Explorer

            Dim oSettings As New Settings

            oSettings.Web.DefaultBrowser = BrowserType.InternetExplorer
            oSettings.DisableDialogMonitoring = False
            oSettings.UnexpectedDialogAction = UnexpectedDialogAction.DoNotHandle
            oSettings.Web.UseHttpProxy = True   ' This must be enabled for us to get the headers being sent and know what the user agent is dynamically

            oBrowser = New Manager(oSettings)

            oBrowser.Start()
            oBrowser.LaunchNewBrowser(oSettings.Web.DefaultBrowser, True, ProcessWindowStyle.Minimized) ' Start minimized

            ' Set up a proxy so that we can capture the request headers

            Dim li As New ArtOfTest.WebAii.Messaging.Http.RequestListenerInfo(AddressOf RequestHandler)

            oBrowser.Http.AddBeforeRequestListener(li)  ' Add proxy listener

            ' Hide the browser window

            HideBrowser(oBrowser)

            ' Load the main url

            oBrowser.ActiveBrowser.NavigateTo(sMainURL)
            oBrowser.ActiveBrowser.WaitUntilReady()

            oBrowser.Http.RemoveBeforeRequestListener(li)   ' Remove proxy listener
            oBrowser.ActiveBrowser.RefreshDomTree()

            Dim bLoggedIn As Boolean = False

            ' Wait for the main logo image to show so that we know we have the right page

            oBrowser.ActiveBrowser.WaitForElement(New HtmlFindExpression("Tagname=div", "Id=clickable_logo"), 30000, False)
            Threading.Thread.Sleep(3000)    ' Wait 3 seconds to prevent loading pages too quickly
            oBrowser.ActiveBrowser.RefreshDomTree()

            ' Check if we are logged in already or if we need to log in

            If oBrowser.ActiveBrowser.Find.ByExpression("Tagname=div", "Id=logout", "InnerText=Logout") IsNot Nothing Then
                ' Cannot find the logout button therefore we are already logged in
                bLoggedIn = True
            ElseIf oBrowser.ActiveBrowser.Find.ByExpression("Tagname=input", "Name=email") IsNot Nothing AndAlso oBrowser.ActiveBrowser.Find.ByExpression("Tagname=input", "Name=wachtwoord") IsNot Nothing Then
                ' Log in

                oBrowser.ActiveBrowser.RefreshDomTree()
                oBrowser.ActiveBrowser.Actions.SetText(oBrowser.ActiveBrowser.Find.ByExpression("Tagname=input", "Name=email"), sUsername)
                oBrowser.ActiveBrowser.Actions.SetText(oBrowser.ActiveBrowser.Find.ByExpression("Tagname=input", "Name=wachtwoord"), sPassword)
                oBrowser.ActiveBrowser.Actions.Click(oBrowser.ActiveBrowser.Find.ByExpression("Tagname=div", "Id=login", "InnerText=Login"))

                ' Wait for page to load

                oBrowser.ActiveBrowser.WaitUntilReady()
                oBrowser.ActiveBrowser.WaitForElement(New HtmlFindExpression("Tagname=div", "Id=logout", "InnerText=Logout"), 30000, False)   ' Wait until Logout button is loaded
                bLoggedIn = True
            Else
                ' Didn't find any controls that we were looking for. Maybe the page was updated recently?

                MessageBox.Show("Error loading page. Maybe the html changed?")
            End If

            If bLoggedIn = True Then
                Dim bStop As Boolean = False
                Dim sPreviewImageFilename As String
                Dim sPreviewImageFileExtension As String
                Dim oURI As Uri = New Uri(sMainURL)
                Dim oCookie As System.Net.Cookie
                Dim nSkippedFiles As Int32 = 0

                ' Save cookies from browser to use with HttpWebRequest later

                For c As Int32 = 0 To oBrowser.ActiveBrowser.Cookies.GetCookies(oURI.Scheme & Uri.SchemeDelimiter & oURI.Host).Count - 1
                    oCookie = New System.Net.Cookie
                    oCookie.Name = oBrowser.ActiveBrowser.Cookies.GetCookies(oURI.Scheme & Uri.SchemeDelimiter & oURI.Host)(c).Name
                    oCookie.Value = oBrowser.ActiveBrowser.Cookies.GetCookies(oURI.Scheme & Uri.SchemeDelimiter & oURI.Host)(c).Value
                    oCookie.Domain = oURI.Host
                    oCookie.Secure = False
                    oCookieContainerObject.Add(oCookie)
                Next

                Threading.Thread.Sleep(3000)    ' Wait 3 seconds to prevent loading pages too quickly

                Do Until bStop = True
                    ' Browse to the list url

                    oBrowser.ActiveBrowser.NavigateTo(sListURL & sListQueryString & nCurrentPageID)
                    oBrowser.ActiveBrowser.WaitUntilReady()

                    If oBrowser.ActiveBrowser.Find.AllByExpression("Tagname=img", "Class=toggleTooltip").Count > 0 Then
                        ' Get all preview images on the page

                        For i As Int32 = 0 To oBrowser.ActiveBrowser.Find.AllByExpression("Tagname=img", "Class=toggleTooltip").Count - 1
                            ' Convert the preview image browser element into an HtmlImage

                            Dim oHtmlImage As ArtOfTest.WebAii.Controls.HtmlControls.HtmlImage = oBrowser.ActiveBrowser.Find.AllByExpression("Tagname=img", "Class=toggleTooltip")(i).[As](Of ArtOfTest.WebAii.Controls.HtmlControls.HtmlImage)()

                            ' Extract the filename and extension from the preview image

                            sPreviewImageFilename = System.IO.Path.GetFileNameWithoutExtension(oHtmlImage.Src)
                            sPreviewImageFileExtension = System.IO.Path.GetExtension(oHtmlImage.Src)

                            ' Create a proper download url using the preview image filename and download the file in the resolution that we want using HttpWebRequest

                            Select Case DownloadImage(sDownloadURL & sPreviewImageFilename & "_" & sResolutionString & sPreviewImageFileExtension, sListURL & sListQueryString & nCurrentPageID)
                                Case Is = oDownloadResult.Success
                                    nSkippedFiles = 0   ' Result skipped files back to zero
                                Case Is = oDownloadResult.Skipped
                                    nSkippedFiles += 1  ' Increment skipped files by one since we have already downloaded this file previously
                                Case Is = oDownloadResult.Failed
                                    ' The image didn't download properly.
                                    ' Do whatever error handling in here that you want to
                                    ' Maybe save the filename to a log file so you know which file(s) failed and download them again later?
                            End Select

                            If nSkippedFiles >= nMaxSkippedFilesInSuccession Then
                                ' We have skipped the maximum amount of files in a row so we must have downloaded them all (This should only ever happen on the 2nd+ run)
                                bStop = True
                                Exit For
                            Else
                                Threading.Thread.Sleep(3000)    ' Wait 3 seconds to prevent loading pages too quickly
                            End If
                        Next

                        ' Increment the 'Start' querystring value by 24 to simulate clicking the 'Next' button and load the next 24 images
                        nCurrentPageID += 24
                    Else
                        ' No more images were found so we stop the application
                        bStop = True
                    End If
                Loop
            End If
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        Finally
            ' Ensure browser is closed when we exit
            CleanupBrowser(oBrowser)
        End Try
    End Sub

    Private Sub RequestHandler(sender As Object, e As ArtOfTest.WebAii.Messaging.Http.HttpRequestEventArgs)
        ' Save the exact user agent we are using so that we can use it with HTTPWebRequest later
        sUserAgent = e.Request.Headers("User-Agent")
    End Sub

    Private Function DownloadImage(ByVal sPage As String, sReferer As String) As oDownloadResult
        Dim req As System.Net.HttpWebRequest
        Dim oReturn As oDownloadResult

        Try
            req = System.Net.HttpWebRequest.Create(sPage)
            req.Method = "GET"
            req.AllowAutoRedirect = False
            req.UserAgent = sUserAgent
            req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            req.Headers.Add("Accept-Language", "en-US,en;q=0.5")
            req.Headers.Add("Accept-Encoding", "gzip, deflate")
            req.Headers.Add("Keep-Alive", "300")
            req.KeepAlive = True

            If oCookieContainerObject IsNot Nothing Then
                ' Set cookie info so that we continue to be logged in
                req.CookieContainer = oCookieContainerObject
            End If

            ' Save file to disk

            Using oResponse As System.Net.WebResponse = CType(req.GetResponse, System.Net.WebResponse)
                Dim sContentDisposition As String = CType(oResponse, System.Net.HttpWebResponse).Headers("Content-Disposition")

                If sContentDisposition IsNot Nothing Then
                    Dim sFilename As String = sContentDisposition.Substring(sContentDisposition.IndexOf("filename="), sContentDisposition.Length - sContentDisposition.IndexOf("filename=")).Replace("filename=", "").Replace("""", "").Replace(";", "").Trim
                    Dim sFullPath As String = System.IO.Path.Combine(sDownloadPath, sFilename)

                    If System.IO.File.Exists(sFullPath) = False Then
                        Using responseStream As IO.Stream = oResponse.GetResponseStream
                            Using fs As New IO.FileStream(sFullPath, System.IO.FileMode.Create, System.IO.FileAccess.Write)
                                Dim buffer(2047) As Byte
                                Dim read As Integer

                                Do
                                    read = responseStream.Read(buffer, 0, buffer.Length)
                                    fs.Write(buffer, 0, read)
                                Loop Until read = 0

                                responseStream.Close()
                                fs.Flush()
                                fs.Close()
                            End Using

                            responseStream.Close()
                        End Using

                        oReturn = oDownloadResult.Success
                    Else
                        oReturn = oDownloadResult.Skipped   ' We have downloaded this file before so skip it
                    End If
                End If

                oResponse.Close()
            End Using
        Catch exc As System.Net.WebException
            MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
            oReturn = oDownloadResult.Failed
        End Try

        Return oReturn
    End Function

    Private Sub HideBrowser(ByRef oBrowser As Manager)

        Dim tmp_hWnd As IntPtr

        For w As Integer = 1 To 10
            tmp_hWnd = oBrowser.ActiveBrowser.Window.Handle
            If Not tmp_hWnd.Equals(IntPtr.Zero) Then Exit For
            Threading.Thread.Sleep(100)
        Next

        If Not tmp_hWnd.Equals(IntPtr.Zero) Then
            ' use ShowWindowAsync to change app window state (minimize and hide it).
            ShowWindowAsync(tmp_hWnd, ShowWindowCommands.Minimize)
            ShowWindowAsync(tmp_hWnd, ShowWindowCommands.Hide)
        Else
            ' no window handle?
            MessageBox.Show("Error - Unable to get a window handle")
        End If
    End Sub

    Private Sub CleanupBrowser(ByRef oBrowser As Manager)
        If oBrowser IsNot Nothing AndAlso oBrowser.ActiveBrowser IsNot Nothing Then
            oBrowser.ActiveBrowser.Close()
        End If

        If oBrowser IsNot Nothing Then
            oBrowser.Dispose()
        End If

        oBrowser = Nothing
    End Sub
End Class

Module Module1
    Public Enum ShowWindowCommands As Integer
        Hide = 0
        Normal = 1
        ShowMinimized = 2
        Maximize = 3
        ShowMaximized = 3
        ShowNoActivate = 4
        Show = 5
        Minimize = 6
        ShowMinNoActive = 7
        ShowNA = 8
        Restore = 9
        ShowDefault = 10
        ForceMinimize = 11
    End Enum

    <DllImport("user32.dll", SetLastError:=True)> _
    Public Function ShowWindowAsync(hWnd As IntPtr, <MarshalAs(UnmanagedType.I4)> nCmdShow As ShowWindowCommands) As <MarshalAs(UnmanagedType.Bool)> Boolean
    End Function
End Module

自动图片下载的网站身份验证，第二部分

相关推荐