如何使用Go获取XML元素的所有属性?

问题描述:

I am trying to parse xml content along with all the attributes of an XML element like this

type Node struct {
  XMLName xml.Name
  Attributes []xml.Attr `xml:",attr"`
  BodyElements string `xml:",innerxml"`
  Nodes   []Node `xml:",any"`
}

var xmldata = []byte("<div><div data-id=\"images/6C7161080\" data-imagesize=\"medium\" data-alignment=\"none\"></div></div>")

func walk(nodes []Node, f func(Node) bool) {
  for _, n := range nodes {
    if f(n) {
        walk(n.Nodes, f)
    }
  }
}


func main() {

  buf := bytes.NewBuffer(xmldata)
  dec := xml.NewDecoder(buf)

  var n Node
  err := dec.Decode(&n)
  if err != nil {
    panic(err)
  }

  walk([]Node{n}, func(n Node) bool {
    if n.XMLName.Local == "p" {
        fmt.Println(string(n.BodyElements))
    } else if n.XMLName.Local == "div"{
        fmt.Println(string(n.BodyElements))
        fmt.Println(len(n.Attributes))
    }
    return true
  })
}

But the value of len(n.Attributes) is always 0. What can I do to get all the attributes in the given element. NOTE: The attribute names are not constant as sometime the element can be a "div" tag or "img" tag or something else. So I can't use the attribute name as

DataId string `xml:"data-id,attr"`

The fundamental problem is that unmarshalling XML to your struct Node doesn't work. Your BodyElements captures the whole content of your root node and nothing is unmarshaled to your Nodes. (Btw: Adding a simple fmt.Printf would have revealed this.)

Why do you try to write your own XML unmarshalling/parsing code? You will fail. Just use the Decoder and the Token method to parse your XML by hand, one token after each other, populating your tree manually. And: If your XML actually is HTML you might want to parse it with package html.