将字符串从UTF-8转换为ISO-8859-1
问题描述:
我正在尝试将UTF-8 string
转换为ISO-8859-1 char*
以在旧代码中使用.我看到的唯一方法是使用 iconv
.
I'm trying to convert a UTF-8 string
to a ISO-8859-1 char*
for use in legacy code. The only way I'm seeing to do this is with iconv
.
我绝对希望使用完全基于string
的C ++解决方案,然后仅在结果字符串上调用.c_str()
.
I would definitely prefer a completely string
-based C++ solution then just call .c_str()
on the resulting string.
我该怎么做?如果可能,请提供代码示例.如果您唯一知道的解决方案,我可以使用iconv
.
How do I do this? Code example if possible, please. I'm fine using iconv
if it is the only solution you know.
答
I'm going to modify my code from another answer to implement the suggestion from Alf.
std::string UTF8toISO8859_1(const char * in)
{
std::string out;
if (in == NULL)
return out;
unsigned int codepoint;
while (*in != 0)
{
unsigned char ch = static_cast<unsigned char>(*in);
if (ch <= 0x7f)
codepoint = ch;
else if (ch <= 0xbf)
codepoint = (codepoint << 6) | (ch & 0x3f);
else if (ch <= 0xdf)
codepoint = ch & 0x1f;
else if (ch <= 0xef)
codepoint = ch & 0x0f;
else
codepoint = ch & 0x07;
++in;
if (((*in & 0xc0) != 0x80) && (codepoint <= 0x10ffff))
{
if (codepoint <= 255)
{
out.append(1, static_cast<char>(codepoint));
}
else
{
// do whatever you want for out-of-bounds characters
}
}
}
return out;
}
无效的UTF-8输入会导致字符丢失.
Invalid UTF-8 input results in dropped characters.