在Go中解组特定的SOAP响应

问题描述:

I am trying to unmarshal the following SOAP response using the below structs.

var data = `<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3rg/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <doSendResponse>
            <doSendResult>Send OK.&lt;ReturnIDs&gt;c71cf425f5;e5e4dbb5ca&lt;/ReturnIDs&gt;</doSendResult>
        </doSendResponse>
    </soap:Body>
</soap:Envelope>`

type ResponseBody struct {
    ResponseBody SendResponse `xml:"Body"`
}
type SendResponse struct {
    Result Result `xml:"doSendResponse"`
}
type Result struct {
    RawMessage string `xml:"doSendResult"`
}

All goes well until after the <doSendResult> element.
This particular tag contains a message i.e. "Send OK." and an HTML encoded <ReturnIDs> element, the problem isn't about the HTML encoded parts, I've already seen this question and the accepted answer. My problem is that I can't manage to extract both the message and the return IDs.

I tried to use the approach suggested in the previously mentioned question but I failed, Here is what I tried so far.

package main

import (
    "encoding/xml"
    "fmt"
)

var data = `<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3rg/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <doSendResponse>
            <doSendResult>Send OK.&lt;ReturnIDs&gt;c71cf425f5;e5e4dbb5ca&lt;/ReturnIDs&gt;</doSendResult>
        </doSendResponse>
    </soap:Body>
</soap:Envelope>`

type ResponseBody struct {
    ResponseBody SendResponse `xml:"Body"`
}
type SendResponse struct {
    Result Result `xml:"doSendResponse"`
}
type Result struct {
    RawMessage string `xml:"doSendResult"`
}
type RawMessage struct {
    IDs     string `xml:"ReturnIDs"`
}

func main() {
    var response ResponseBody
    err := xml.Unmarshal([]byte(data), &response)
    if err != nil {
        panic(err.Error())
    }
    fmt.Printf("%+v
", response)

    var rawMessage RawMessage
    err = xml.Unmarshal([]byte(response.ResponseBody.Result.RawMessage), &rawMessage)
    if err != nil {
        panic(err.Error())
    }
    fmt.Printf("%+v
", rawMessage)

}

Output:
{ResponseBody:{Result:{RawMessage:Send OK.<ReturnIDs>c71cf425f5;e5e4dbb5ca</ReturnIDs>}}} {IDs:} I also tried to Unesacpe the response, then tried to unmarshal it, it partially works, but there are 3 main problems with this approach:

  1. It's too slow
  2. I could only either get the ReturnIDs or the message, not both.
  3. I believe it's just an ugly hack and there must be a better way to do that (That I'm not aware of, yet.)

So, How can I extract both values for the message (Send OK.) and the <ReturnIDs>?

You can decode the doSendResult tag content with multiple ways, but I did a example:

play.golang.org/p/NC9YrWqK0k

I defined two type corresponding to the tags inside body of the soap envelop:

type (
    SendResponse struct {
        SendResult SendResult `xml:"Body>doSendResponse>doSendResult"`
    }

    SendResult struct {
        RawMessage string   `xml:"-"`
        Text       string   `xml:"-"`
        IDS        []string `xml:"-"`
    }
)

The type SendResult has a custom unmarshal function to read the raw message and populate the struct

func (sr *SendResult) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    var raw string
    d.DecodeElement(&raw, &start)

    var st struct {
        Contents  string `xml:",chardata"`
        ReturnIDs string `xml:"ReturnIDs"`
    }

    err := xml.Unmarshal([]byte("<xml>"+raw+"</xml>"), &st)
    if err != nil {
        panic(err.Error())
    }

    sr.RawMessage = raw
    sr.Text = st.Contents
    sr.IDS = strings.Split(st.ReturnIDs, ";")

    return nil
}

This is the how to use:

const data = `<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3rg/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <doSendResponse>
            <doSendResult>Send OK.&lt;ReturnIDs&gt;c71cf425f5;e5e4dbb5ca&lt;/ReturnIDs&gt;</doSendResult>
        </doSendResponse>
    </soap:Body>
</soap:Envelope>`

func main() {
    var sendResponse SendResponse

    err := xml.Unmarshal([]byte(data), &sendResponse)
    if err != nil {
        panic(err.Error())
    }

    fmt.Printf("%+v
", sendResponse)
}

Since the contents of the doSendResult element appears to be a "custom" format (as opposed to a well-formed document such as HTML, XML, etc.) regular expressions might be a good way to parse the result here.

For example:

type SendResult struct {
  Status    string
  ReturnIds []string
}

var doSendResultRegex = regexp.MustCompile("^Send (.*?)\\.<ReturnIDs>(.*?)</ReturnIDs>$")

func ParseSendResult(s string) *SendResult {
  ss := doSendResultRegex.FindStringSubmatch(s)
  if ss == nil {
    return nil
  }
  return &SendResult{
    Status:    ss[1],
    ReturnIds: strings.Split(ss[2], ";"),
  }
}

// ...
fmt.Println("%#v
", ParseSendResult(response.Result.RawMessage))
// &main.SendResult{
//   Status:    "OK",
//   ReturnIds: []string{"c71cf425f5", "e5e4dbb5ca"}
// }

Of course, you may want to modify the doSendResultRegex expression depending on other examples of that data but the code above should illustrate the idea.