在Go中解析XML时处理名称空间

问题描述:

I am trying to parse a piece if XML in Go:

package main

import (
    "encoding/xml"
    "fmt"
)

type XML struct {
    Foo string `xml:"foo"`
}

func main() {
    rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    fmt.Printf("foo: %s
", x.Foo)
}

This outputs:

foo: B

While I expected it to produce:

foo: A

How do I get content of the first foo tag (i.e. one without namespace)?

I don't think the xml decoder can specify an element should have no namespace with struct tags. But I do know that it can retrieve the information about the namespaces for you and you could then post process the data after to get the same result:

package main

import (
    "encoding/xml"
    "fmt"
)

type Foo struct {
    XMLName xml.Name
    Data string `xml:",chardata"`
}

type XML struct {
    Foo []Foo `xml:"foo"`
}

func main() {
    rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    //fmt.Printf("foo: %#v
", x)
    for _, el := range x.Foo {
       if el.XMLName.Space == "" {
          fmt.Printf("non namespaced foo %q", el.Data)
      }
    }
}

http://play.golang.org/p/aDEFPmHPc0

You have two values in series in your xml document. You only have room for one value in your struct. The xml parser is parsing the first one and then overwriting it with the second one.

Change Foo to a slice in the struct and then you'll get both values.

http://play.golang.org/p/BRgsuMQ7rK

package main

import (
    "encoding/xml"
    "fmt"
)

type XML struct {
    Foo []string `xml:"foo"`
}

func main() {
    rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    fmt.Printf("foo: %s
", x.Foo[0])
    fmt.Printf("both: %v
", x.Foo)
}