如何读取二进制文件与unicode文件名c ++?

如何读取二进制文件与unicode文件名c ++?


在我正在开发的项目中,我处理了很多字符串操作;字符串从二进制文件以及其编码(可以是单字节或双字节)读取。基本上,我读取字符串值为 vector< char> ,读取编码,然后将所有字符串转换为 wstring 为了一致性。

In the project I'm working on, I deal with quite a few string manipulations; strings are read from binary files along with their encoding (which can be single or double byte). Essentially, I read the string value as vector<char>, read the encoding and then convert all strings to wstring, for consistency.

这样工作得相当不错,但是文件名本身可以是双字节字符。我完全无法实际打开输入流。在CI中将使用 _wfopen 函数传递 wchar_t * path ,但 wifstream 似乎有不同的行为,因为它是专门为从文件中读取双字节字符而设计的,而不是从具有双字节文件名的文件中读取单个字节。

This works reasonably well, however the filenames themselves can be double-byte chars. I'm totally stumped on how to actually open the input stream. In C I would use _wfopen function passing wchar_t* path, but wifstream seems to behave differently, as it's specifically designed for reading double-byte chars from a file, not for reading single bytes from a file with double-byte filename.


What is the solution to this problem?

编辑:搜索网络,看起来在标准C ++例如,请参阅此讨论)。但是我想知道C ++ 11是否真的在这个领域添加了一些有用的东西。

Searching the net, it looks like there's no support for this at all in standard C++ (e.g. see this discussion). However I'm wondering if C++11 actually adds something useful in this area.


How the string you pass to open is mapped to a filename is implementation dependent. In a Unix environment, it is passed almost literally—only '/' and '\0' are treated specially. In other environments, other rules rule, and I've had problems in the past because I'd written a file in Unix, and couldn't do anything with it under Windows (which treats a ':' in the filename specially).


Another question is where these files come from. As mentionned above, there may be absolutely no way of opening them on your system: a filename with a ':' simply cannot be opened in Windows. In Unix, if you end up with '\0' characters in the filename itself, you probably can't read them either, and UTF16 filenames will appear to have '\0' characters in them under Unix. You're only solution may be to use native tools on the system which generated the files to rename them.


It's less clear to me how you could get such filenames on a Unix disk in the first place. How does an SMB server such as Samba map UTF16 filenames when it is serving on a Windows box? Or an NFS server—I think such things also exist under Windows.