Invalid byte x of n-byte UTF-8 sequence
是在从客户端发来的SOAPMessage中getEnvelope时出了错
提示错误原因是:Invalid byte 1 of 1-byte UTF-8 sequence.
它解析到了在 1字节UTF-8序列中无效的第一字节
1字节UTF-8序列是怎么样的呢?
One-byte
codes are used only for the ASCII values 0 through 127. In this case
the UTF-8 code has the same value as the ASCII code. The high-order bit
of these codes is always 0.
形式是0xxxxxxx
也就是说它读到的字节最高位是1,因此被认定为是非法。
前提是它认定该字节是UTF-8编码,为什么会认定是UTF-8,可能是默认,也可能是哪里指定,比如xml文件中。
至于凭什么它能认定是1-byte UTF-8 sequence,不是很清楚,可能存在什么预认定机制,或者这个byte对于任意字节的UTF-8的首字节来说都是非法的,它只是表达成这样(但造成歧义了)
结论:xml的编码实际上不是utf-8,可能是gb2312/gbk等,如果以这些编码去读取,也许就不会有这问题,或者传过来时将xml编码固定在utf-8
补充修改一下:
要认定是1-byte UTF-8 sequence还是比较容易认的,只要该字节后就出现了UTF-8 sequence的任意字节首字节,就可以辨识这是一个n-byte UTF-8 sequence.
first byte pattern of 1-byte UTF-8 sequence: 0xxxxxxx
first byte pattern of 2-byte UTF-8 sequence: 110xxxxx
first byte pattern of 3-byte UTF-8 sequence: 1110xxxx
first byte pattern of 4-byte UTF-8 sequence: 11110xxx
对于以下这些异常提示也是同理:
Invalid byte 2 of 2-byte UTF-8 sequence.
Invalid byte 2 of 3-byte UTF-8 sequence.
Invalid byte 2 of 4-byte UTF-8 sequence.
http://topic.****.net/u/20120513/13/97af0141-df0d-4758-8fab-f91dd9af01db.html?seed=973731074&r=78553396#r_78553396
http://en.wikipedia.org/wiki/UTF-8