为什么将字节数组转换为字符串然后返回到字节数组时长度不同?
我有以下Java代码:
I have the following Java code:
byte[] signatureBytes = getSignature();
String signatureString = new String(signatureBytes, "UTF8");
byte[] signatureStringBytes = signatureString.getBytes("UTF8");
System.out.println(signatureBytes.length == signatureStringBytes.length); // prints false
Q:我可能误解了,但是我以为 new String(byte [] bytes,String charset)
和 String.getBytes(charset)
是反向操作?
Q: I'm probably misunderstanding this, but I thought that new String(byte[] bytes, String charset)
and String.getBytes(charset)
are inverse operations?
问:作为跟进,什么是以字符串方式传输byte []数组的安全方法?
Q: As a follow up, what is a safe way to transport a byte[] array as a String?
不是每个 byte []
是有效的UTF-8。默认情况下,无效序列被一个固定的字符替换,我认为这是这样一个长度变化的原因。
Not every byte[]
is valid UTF-8. By default invalid sequences gets replaced by a fixed character, and I think that's the reason for such a length change.
尝试拉丁语1,它不应该发生,因为它是每个 byte []
的简单编码是有意义的。
Try Latin-1, it should not happen, as it's a simple encoding for which each byte[]
is meaningful.
对于Windows-1252,无论如何都可以。在那里有未定义的序列(实际上是未定义的字节),但是所有的字符都被编码在单个字节中。新的字节[]
可能与原始的不同,但长度必须相同。
Neither for Windows-1252 should it happen. There are undefined sequences there (in fact undefined bytes), but all chars get encoded in a single byte. The new byte[]
may differ from the original one, but their lengths must be the same.