为什么我的代码花这么长时间才能返回结果?

为什么我的代码花这么长时间才能返回结果?

问题描述:

When running this code I have to wait 10 seconds for s.Locations to print and 60+ seconds for n.Titles to print. What is causing this?

Tips on how to troubleshoot this would be helpful i.e. seeing how long it takes for certain lines of code to complete. New to Go so not sure how to exactly do this.

I've made sure I close my connections. Since everything else on my computer loads blazing fast I don't think to access the internet via http.Get should be slow.

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "net/http"
    "strings"
)

// SitemapIndex is the root xml
type SitemapIndex struct {
    Locations []string `xml:"sitemap>loc"`
}

// News is the individual categories
type News struct {
    Titles    []string `xml:"url>news>title"`
    Keywords  []string `xml:"url>news>keywords"`
    Locations []string `xml:"url>loc"`
}

// NewsMap is the
type NewsMap struct {
    Keywords string
    Location string
}

func main() {
    var s SitemapIndex
    var n News
    // np := make(map[string]NewsMap)
    resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)
    xml.Unmarshal(bytes, &s)
    resp.Body.Close()

    for i := range s.Locations {
        s.Locations[i] = strings.TrimSpace(s.Locations[i])
    }

    fmt.Println(s.Locations) // slice of data

    for _, Location := range s.Locations {
        resp, _ := http.Get(Location)
        bytes, _ := ioutil.ReadAll(resp.Body)
        xml.Unmarshal(bytes, &n)
        resp.Body.Close()
    }

    fmt.Println(n.Titles)
}

I get the output but I have to wait 10 seconds for s.Locations and 60+ seconds for n.Titles

Tips on how to troubleshoot this would be helpful.


Start with the simple things, measuring one thing at a time, a scientific experiment.


Use curl to measure basic response time.

$ curl https://www.google.com/robots.txt -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7246    0  7246    0     0  94103      0 --:--:-- --:--:-- --:--:-- 94103
$ curl https://www.nytimes.com/robots.txt -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   934  100   934    0     0     61      0  0:00:15  0:00:15 --:--:--   230
$ curl https://www.washingtonpost.com/robots.txt -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3360  100  3360    0     0    133      0  0:00:25  0:00:25 --:--:--   869
$

Google has no delay. The New York Times has a 15 second delay. The Washington Post has a 25 second delay.

In Go, confirm that The Washington Post has a 25 second delay.

$ go run wapo.go
25.174366651s
$ cat wapo.go
package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
    "time"
)

func main() {
    start := time.Now()
    resp, err := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    if err != nil {
        fmt.Fprintln(os.Stderr, err)
        return
    }
    _, err = ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Fprintln(os.Stderr, err)
        return
    }
    resp.Body.Close()
    fmt.Fprintln(os.Stderr, time.Since(start))
}
$

Next, try a different ISP from a different computer.

$ curl https://www.google.com/robots.txt -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7246    0  7246    0     0  27343      0 --:--:-- --:--:-- --:--:-- 27343
$ curl https://www.nytimes.com/robots.txt -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   934  100   934    0     0   2017      0 --:--:-- --:--:-- --:--:--  2017
$ curl https://www.washingtonpost.com/robots.txt -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3360  100  3360    0     0    356      0  0:00:09  0:00:09 --:--:--   840
$ curl https://www.washingtonpost.com/news-sitemaps/index.xml -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1101  100  1101    0     0    104      0  0:00:10  0:00:10 --:--:--   266
$ 

$ go run wapo.go
8.378458882s
$ 

Google has no delay. The New York Times has a small delay. The Washington Post has a 9 second delay.


The Go code and compiler are the same:

$ go version
go version devel +a25c2878c7 Sat Jul 27 23:29:18 2019 +0000 linux/amd64
$ cat wapo.go
package main

import (
    "fmt"
    "net/http"
    "os"
    "time"
)

func main() {
    start := time.Now()
    resp, err := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    if err != nil {
        fmt.Fprintln(os.Stderr, err)
        return
    }
    resp.Body.Close()
    fmt.Fprintln(os.Stderr, time.Since(start))
}
$

Therefore, focus on network and site factors.