和UNI code,升压,C ++,codecvts难倒
在C ++中,我想用统一code做的事情。因此倒下的Uni code的兔子洞后,我已经成功在混乱,头痛和语言环境火车残骸就结了。
In C++, I want to use Unicode to do things. So after falling down the rabbit hole of Unicode, I've managed to end up in a train wreck of confusion, headaches and locales.
但在加速我已经试图用统一code文件路径,并尝试使用与统一code输入Boost的程序选项库中的不幸的问题。我读过什么我能找到的语言环境中,codecvts,统一code编码和加速的主题。
But in Boost I've had the unfortunate problem of trying to use Unicode file paths and trying to use the Boost program options library with Unicode input. I've read whatever I could find on the subjects of locales, codecvts, Unicode encodings and Boost.
我目前试图把事情的工作是有一个codeCVT,需要一个UTF-8字符串并将其转换为平台的编码(对POSIX UTF-8,在Windows UTF-16),我已经一直试图避免 wchar_t的
。
My current attempt to get things to work is to have a codecvt that takes a UTF-8 string and converts it to the platform's encoding (UTF-8 on POSIX, UTF-16 on Windows), I've been trying to avoid wchar_t
.
其实我已经得到了正在尝试Boost.Locale做到这一点,从一个UTF-8字符串转换为输出一个UTF-32串最接近的。
The closest I've actually gotten is trying to do this with Boost.Locale, to convert from a UTF-8 string to a UTF-32 string on output.
#include <string>
#include <boost/locale.hpp>
#include <locale>
int main(void)
{
std::string data("Testing, 㤹");
std::locale fromLoc = boost::locale::generator().generate("en_US.UTF-8");
std::locale toLoc = boost::locale::generator().generate("en_US.UTF-32");
typedef std::codecvt<wchar_t, char, mbstate_t> cvtType;
cvtType const* toCvt = &std::use_facet<cvtType>(toLoc);
std::locale convLoc = std::locale(fromLoc, toCvt);
std::cout.imbue(convLoc);
std::cout << data << std::endl;
// Output is unconverted -- what?
return 0;
}
我觉得我有一些其他类型的转换使用宽字符的工作,但我真的不知道我在做连。我不知道这份工作的合适工具,在这一点上的东西。帮助?
I think I had some other kind of conversion working using wide characters, but I really don't know what I'm even doing. I don't know what the right tool for the job is at this point. Help?
好吧,长的几个月之后,我想通了,我想帮助人们在未来。
Okay, after a long few months I've figured it out, and I'd like to help people in the future.
首先,在codeCVT事情是这样做的错误的方式。 Boost.Locale提供了助推区域:: :: CONV命名字符集之间进行转换的简单方法。这里有一个例子(有没有基于区域设置等)。
First of all, the codecvt thing was the wrong way of doing it. Boost.Locale provides a simple way of converting between character sets in its boost::locale::conv namespace. Here's one example (there's others not based on locales).
#include <boost/locale.hpp>
namespace loc = boost::locale;
int main(void)
{
loc::generator gen;
std::locale blah = gen.generate("en_US.utf-32");
std::string UTF8String = "Tésting!";
// from_utf will also work with wide strings as it uses the character size
// to detect the encoding.
std::string converted = loc::conv::from_utf(UTF8String, blah);
// Outputs a UTF-32 string.
std::cout << converted << std::endl;
return 0;
}
正如你所看到的,如果你更换的en_US.UTF-32和它会在用户的语言环境的输出。
As you can see, if you replace the "en_US.utf-32" with "" it'll output in the user's locale.
我还是不知道如何做的std ::法院做这一切的时候,但翻译()Boost.Locale输出功能在用户的语言环境。
I still don't know how to make std::cout do this all the time, but the translate() function of Boost.Locale outputs in the user's locale.
对于使用UTF-8字符串跨平台的文件系统,似乎这是可能的,这里是如何链接做的。
As for the filesystem using UTF-8 strings cross platform, it seems that that's possible, here's a link to how to do it.