Java字符串字符编码 - 法语 - 荷兰语语言

问题描述:

我有以下代码

public static void main(String[] args) throws UnsupportedEncodingException {
        System.out.println(Charset.defaultCharset().toString());

        String accentedE = "é";

        String utf8 = new String(accentedE.getBytes("utf-8"), Charset.forName("UTF-8"));
        System.out.println(utf8);
        utf8 = new String(accentedE.getBytes(), Charset.forName("UTF-8"));
        System.out.println(utf8);
        utf8 = new String(accentedE.getBytes("utf-8"));
        System.out.println(utf8);
        utf8 = new String(accentedE.getBytes());
        System.out.println(utf8);
}

上面的输出如下

windows-1252
é
?
é
é

有人可以帮我了解这做什么?为什么这个输出?

Can someone help me understand what does this do ? Why this output ?

如果你已经有一个 String 没有必要对其进行编码和解码,字符串已经是已经解码了原始字节的人的结果。

If you already have a String, there is no need to encode and decode it right back, the string is already a result from someone having decoded raw bytes.

在字符串文本的情况下,有人是编译器将您的源作为原始字节读取,并在您指定的编码中解码它。如果你已经在Windows-1252编码中实际保存了源文件,编译器将其解码为Windows-1252,一切都很好。如果没有,您需要通过声明编译器在编译源代码时使用的正确编码来解决此问题...

In the case of a string literal, the someone is the compiler reading your source as raw bytes and decoding it in the encoding you have specified to it. If you have physically saved your source file in Windows-1252 encoding, and the compiler decodes it as Windows-1252, all is well. If not, you need to fix this by declaring the correct encoding for the compiler to use when compiling your source...

String utf8 = new String(accentedE.getBytes("utf-8"), Charset.forName("UTF-8"));

绝对没有。 (编码为UTF-8,解码为UTF-8 ==无操作)

Does absolutely nothing. (Encode as UTF-8, Decode as UTF-8 == no-op)

utf8 = new String(accentedE.getBytes(), Charset.forName("UTF-8"));

将字符串编码为Windows-1252,然后将其解码为UTF-8。结果只能在Windows-1252中解码(因为是在Windows-1252,duh中编码的),否则会得到奇怪的结果。

Encodes string as Windows-1252, and then decodes it as UTF-8. The result must only be decoded in Windows-1252 (because it is encoded in Windows-1252, duh), otherwise you will get strange results.

utf8 = new String(accentedE.getBytes("utf-8"));

将字符串编码为UTF-8,然后解码为Windows- 1252。

Encodes a string as UTF-8, and then decodes it as Windows-1252. Same principles apply as in previous case.

utf8 = new String(accentedE.getBytes());

绝对没有。 (编码为Windows-1252,解码为Windows-1252 ==无操作)

Does absolutely nothing. (Encode as Windows-1252, Decode as Windows-1252 == no-op)

类似于可能更容易理解的整数:

Analogy with integers that might be easier to understand:

int a = 555;
//The case of encoding as X and decoding right back as X
a = Integer.parseInt(String.valueOf(a), 10);
//a is still 555

int b = 555;
//The case of encoding as X and decoding right back as Y
b = Integer.parseInt(String.valueOf(b), 15);
//b is now 1205 I.E. strange result

这两个都是无用的,因为我们已经有了我们所需要的任何代码, 555

Both of these are useless because we already have what we needed before doing any of the code, the integer 555.

需要
将字符串编码为原始字节, em>离开系统,并且需要在原始字节进入系统时将其解码为字符串。没有必要在系统内 编码和解码。

There is a need for encoding your string into raw bytes when it leaves your system and there is a need for decoding raw bytes into a string when they come into your system. There is no need to encode and decode right back within the system.