为什么将字节数组转换为字符串然后返回到字节数组时长度不同？

问题描述：

我有以下Java代码：

I have the following Java code:

byte[] signatureBytes = getSignature();

String signatureString = new String(signatureBytes, "UTF8");
byte[] signatureStringBytes = signatureString.getBytes("UTF8");

System.out.println(signatureBytes.length == signatureStringBytes.length); // prints false

Q：我可能误解了，但是我以为 new String（byte [] bytes，String charset）和 String.getBytes（charset）是反向操作？

Q: I'm probably misunderstanding this, but I thought that new String(byte[] bytes, String charset) and String.getBytes(charset) are inverse operations?

问：作为跟进，什么是以字符串方式传输byte []数组的安全方法？

Q: As a follow up, what is a safe way to transport a byte[] array as a String?

答

不是每个 byte [] 是有效的UTF-8。默认情况下，无效序列被一个固定的字符替换，我认为这是这样一个长度变化的原因。

Not every byte[] is valid UTF-8. By default invalid sequences gets replaced by a fixed character, and I think that's the reason for such a length change.

尝试拉丁语1，它不应该发生，因为它是每个 byte [] 的简单编码是有意义的。

Try Latin-1, it should not happen, as it's a simple encoding for which each byte[] is meaningful.

对于Windows-1252，无论如何都可以。在那里有未定义的序列（实际上是未定义的字节），但是所有的字符都被编码在单个字节中。新的字节[] 可能与原始的不同，但长度必须相同。

Neither for Windows-1252 should it happen. There are undefined sequences there (in fact undefined bytes), but all chars get encoded in a single byte. The new byte[] may differ from the original one, but their lengths must be the same.

为什么将字节数组转换为字符串然后返回到字节数组时长度不同？

相关推荐