如何在控制台中使动态字符串与UTF-8一起使用?

问题描述:

大多数答案问题在此处用于将L放在任何UTF-8字符串之前.根据我的IDE,在源代码中我没有发现它的常量是在winnt.h中定义的.

Most of answers and questions here on SO use to put L before any UTF-8 string. I found no explantion of what it is, in the source code, the constant is, according to my IDE, defined in winnt.h.

这是我的用法,不知道它是什么

This is how I use it, without knowing what it is:

std::wcout<<L"\"Přetečení zásobníku\" is Stack overflow in Czech.";

很明显,常量连接不能应用于变量:

Obviously, constant concatenation cannot be applied on variables:

void printUTF8(const char* str) {
  //Does not make the slightest bit of sense
  std::wcout<<L str; 
}

那是什么以及如何将其添加到动态字符串中?

So what is it and how to add it to dynamic strings?

L向C编译器指示该字符串由宽字符"组成.在Windows中,这些字符将为UTF-16-字符串中每个字符的宽度为16位或2个字节:

L is an indication to the C compiler that the string is composed of "wide characters". In Windows, these would be UTF-16 - each character that you put in the string is 16 bits, or two bytes, wide:

L"This is a wide string"

相反,UTF-8字符串始终是由字节组成的字符串. ASCII字符(A-Z 0-9等)的编码方式一直保持不变-范围为0x00至0x7F(或0至127).国际字符(例如ř)使用0x80到0xFF范围内的多个字节进行编码-

In contrast, a UTF-8 string is always a string composed of bytes. ASCII characters (A-Z 0-9 etc) are encoded the way they have always been - in the range 0x00 to 0x7F (or 0 to 127). International characters (like ř) are encoded using multiple bytes in the range 0x80 to 0xFF - there is a very good explanation on wikipedia. The advantage is that it can be represented using ordinary C strings.

"This is an ordinary string, but also a UTF-8 string"

"This is a C cedilla in UTF-8: \xc3\x87"

但是,如果您在实际代码中输入这些国际字符,则编辑器需要知道您正在输入UTF-8,以便它可以正确编码字符-就像上面的C cedilla一样.然后,该字符串将正确地传递给您的函数.

However, if you are typing these international characters in to actual code, your editor needs to know that you are typing in UTF-8 so it can encode the characters correctly - like the C cedilla above. Then the string will be passed correctly to your function.

在您的情况下,您的评论表明您正在使用UTF-16.在这种情况下,还有另外两个问题:

In your case, your comment indicates that you are using UTF-16. In which case there are two other issues:

  • 默认情况下,控制台不会正确输出Unicode字符.您需要将字体更改为Truetype字体,例如Lucida Console

  • The console will, by default, not output Unicode characters correctly. You need to change the font to a truetype font like Lucida Console

您还需要将输出模式更改为Unicode UTF-16.您可以执行以下操作:

You also need to change the output mode to a Unicode UTF-16 one. You can do this with:

_setmode(_fileno(stdout),_O_U16TEXT);

_setmode(_fileno(stdout), _O_U16TEXT);

代码示例:

#include <iostream>
#include <io.h>
#include <fcntl.h>

int wmain(int argc, wchar_t* argv[])
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::wcout << L"Přetečení zásobníku is Stack overflow in Czech." << std::endl;
}