Ajax加载网站内容后进行Web爬取

Ajax加载网站内容后进行Web爬取

问题描述:

I'm trying to get colly to scrape the following page: https://www56.muenchen.de/termin/index.php?loc=BB.

Here is my code:

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector(
        colly.IgnoreRobotsTxt(),
        colly.Async(false),
    )

    c.OnHTML("html", func(e *colly.HTMLElement) {
        fmt.Println(e.Text)
    })

    c.OnError(func(_ *colly.Response, err error) {
        log.Println("Something went wrong:", err)
    })

    c.Visit("https://www56.muenchen.de/termin/index.php?loc=BB")

    c.OnScraped(func(r *colly.Response) {
        fmt.Println("Finished")
    })
}

The problem is that after the website is visited it loads some content. I'm unsure how to tell colly to "wait" until that has happened and then look at the result.

Looking forward to some ideas.

我正试图让colly抓取以下页面: https://www56.muenchen.de/termin/index.php?loc=BB 。 p>

这是我的代码: p>

 包main 
 
import(
“ fmt” 
“ log” 
 \  n“ github.com/gocolly/colly"
)

func main(){
c:= colly.NewCollector(
 colly.IgnoreRobotsTxt(),
 colly.Async(false),
)\  n 
 c.OnHTML(“ html”,func(e * colly.HTMLElement){
 fmt.Println(e.Text)
})
 
 c.OnError(func(_ * colly.Response, 错误错误){
 log.Println(“出问题的地方:”,错误)
})
 
 c.Visit(“ https://www56.muenchen.de/termin/index.php?loc=  BB“)
 
 c.OnScraped(func(r * colly.Response){
 fmt.Println(” Finished“)
})
} 
  code>  pre> 
 \  n 

问题是,在访问网站后,它会加载一些内容。 我不确定如何告诉柯利“等待”直到发生这种情况,然后查看结果。 p>

期待一些想法。 p> div>

It can't since colly would have to do that client-side, but colly does not execute JavaScript - so no Ajax with it.

To simulate a browser you can use selenium or phantomjs as the link above suggests.