Java字符串替换和NUL(NULL,ASCII 0)字符?

问题描述:

测试了别人的代码,我注意到一些JSP页面打印出时髦的非ASCII字符。进入源代码我发现了这个消息:

Testing out someone elses code, I noticed a few JSP pages printing funky non-ASCII characters. Taking a dip into the source I found this tidbit:

// remove any periods from first name e.g. Mr. John --> Mr John
firstName = firstName.trim().replace('.','\0');

使用空字符替换字符串中的字符是否可以在Java中工作?我知道'\0''将终止一个C字符串。这会是时髦角色的罪魁祸首吗?

Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a C-string. Would this be the culprit to the funky characters?


用字符串替换字符串中的字符一个null字符甚至可以在Java中工作?我知道'\''将终止一个c字符串。

Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a c-string.

这取决于你如何定义工作原理。它是否用'\ 0'替换所有出现的目标字符?绝对!

That depends on how you define what is working. Does it replace all occurrences of the target character with '\0'? Absolutely!

String s = "food".replace('o', '\0');
System.out.println(s.indexOf('\0')); // "1"
System.out.println(s.indexOf('d')); // "3"
System.out.println(s.length()); // "4"
System.out.println(s.hashCode() == 'f'*31*31*31 + 'd'); // "true"

一切似乎都对我有用! indexOf 可以找到它,它计算为长度的一部分,其哈希码计算的值为0;一切都是由JLS / API指定的。

Everything seems to work fine to me! indexOf can find it, it counts as part of the length, and its value for hash code calculation is 0; everything is as specified by the JLS/API.

如果您期望用空字符替换字符会以某种方式删除它不会工作字符串中的那个字符。当然它不会那样工作。空字符仍然是一个字符!

It DOESN'T work if you expect replacing a character with the null character would somehow remove that character from the string. Of course it doesn't work like that. A null character is still a character!

String s = Character.toString('\0');
System.out.println(s.length()); // "1"
assert s.charAt(0) == 0;

如果您期望空字符终止,它也不会工作一个字符串。从上面的片段中可以看出,但在JLS中也明确指出( 10.9。字符数组不是字符串):

It also DOESN'T work if you expect the null character to terminate a string. It's evident from the snippets above, but it's also clearly specified in JLS (10.9. An Array of Characters is Not a String):


在Java编程语言中,与C不同, char 的数组不是 String ,也不是 String char 的数组也被'\ u0000'(NUL字符)终止。

In the Java programming language, unlike C, an array of char is not a String, and neither a String nor an array of char is terminated by '\u0000' (the NUL character).








这会是时髦人物的罪魁祸首吗?

Would this be the culprit to the funky characters?

现在我们谈论的是一个完全不同的东西,即如何在屏幕上呈现字符串。真相是,甚至是Hello world!如果你使用dingbats字体会看起来很时髦。 unicode字符串在一个语言环境中看起来很时髦,但在另一个语言环境中看起来不是即使是正确渲染的包含中文字符的unicode字符串,对于来自格陵兰岛的人来说仍然看起来很时髦。

Now we're talking about an entirely different thing, i.e. how the string is rendered on screen. Truth is, even "Hello world!" will look funky if you use dingbats font. A unicode string may look funky in one locale but not the other. Even a properly rendered unicode string containing, say, Chinese characters, may still look funky to someone from, say, Greenland.

也就是说,空字符可能看起来很时髦而不管;通常它不是你想要显示的角色。也就是说,由于null字符不是字符串终止符,因此Java能够以这种或那种方式处理它。

That said, the null character probably will look funky regardless; usually it's not a character that you want to display. That said, since null character is not the string terminator, Java is more than capable of handling it one way or another.

现在要解决我们假设的预期效果,即从字符串中删除所有句点,最简单的解决方案是使用 replace(CharSequence,CharSequence)重载。

Now to address what we assume is the intended effect, i.e. remove all period from a string, the simplest solution is to use the replace(CharSequence, CharSequence) overload.

System.out.println("A.E.I.O.U".replace(".", "")); // AEIOU

提到 replaceAll 解决方案在这里,但这适用于正则表达式,这就是为什么你需要转义点元字符,并且可能会更慢。

The replaceAll solution is mentioned here too, but that works with regular expression, which is why you need to escape the dot meta character, and is likely to be slower.