如何在控制台中使动态字符串与UTF-8一起使用?
大多数答案和问题在此处用于将L
放在任何UTF-8字符串之前.根据我的IDE,在源代码中我没有发现它的常量是在winnt.h
中定义的.
Most of answers and questions here on SO use to put L
before any UTF-8 string. I found no explantion of what it is, in the source code, the constant is, according to my IDE, defined in winnt.h
.
这是我的用法,不知道它是什么
This is how I use it, without knowing what it is:
std::wcout<<L"\"Přetečení zásobníku\" is Stack overflow in Czech.";
很明显,常量连接不能应用于变量:
Obviously, constant concatenation cannot be applied on variables:
void printUTF8(const char* str) {
//Does not make the slightest bit of sense
std::wcout<<L str;
}
那是什么以及如何将其添加到动态字符串中?
So what is it and how to add it to dynamic strings?
L向C编译器指示该字符串由宽字符"组成.在Windows中,这些字符将为UTF-16-字符串中每个字符的宽度为16位或2个字节:
L is an indication to the C compiler that the string is composed of "wide characters". In Windows, these would be UTF-16 - each character that you put in the string is 16 bits, or two bytes, wide:
L"This is a wide string"
相反,UTF-8字符串始终是由字节组成的字符串. ASCII字符(A-Z 0-9等)的编码方式一直保持不变-范围为0x00至0x7F(或0至127).国际字符(例如ř)使用0x80到0xFF范围内的多个字节进行编码-
In contrast, a UTF-8 string is always a string composed of bytes. ASCII characters (A-Z 0-9 etc) are encoded the way they have always been - in the range 0x00 to 0x7F (or 0 to 127). International characters (like ř) are encoded using multiple bytes in the range 0x80 to 0xFF - there is a very good explanation on wikipedia. The advantage is that it can be represented using ordinary C strings.
"This is an ordinary string, but also a UTF-8 string"
"This is a C cedilla in UTF-8: \xc3\x87"
但是,如果您在实际代码中输入这些国际字符,则编辑器需要知道您正在输入UTF-8,以便它可以正确编码字符-就像上面的C cedilla一样.然后,该字符串将正确地传递给您的函数.
However, if you are typing these international characters in to actual code, your editor needs to know that you are typing in UTF-8 so it can encode the characters correctly - like the C cedilla above. Then the string will be passed correctly to your function.
在您的情况下,您的评论表明您正在使用UTF-16.在这种情况下,还有另外两个问题:
In your case, your comment indicates that you are using UTF-16. In which case there are two other issues:
-
默认情况下,控制台不会正确输出Unicode字符.您需要将字体更改为Truetype字体,例如Lucida Console
The console will, by default, not output Unicode characters correctly. You need to change the font to a truetype font like Lucida Console
您还需要将输出模式更改为Unicode UTF-16.您可以执行以下操作:
You also need to change the output mode to a Unicode UTF-16 one. You can do this with:
_setmode(_fileno(stdout),_O_U16TEXT);
_setmode(_fileno(stdout), _O_U16TEXT);
代码示例:
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Přetečení zásobníku is Stack overflow in Czech." << std::endl;
}