将字符串(UTF-16)转换为C#中的UTF-8

问题描述:

我需要在C#中将字符串转换为UTF-8。我已经尝试了很多方法,但是没有我想要的。
我将我的字符串转换成一个字节数组,然后尝试将其写入一个XML文件(哪个编码是UTF-8 ....),但是我得到相同的字符串(根本没有编码)有一个字节列表是无用的....
有人面临同样的问题吗?

I need to convert a string to UTF-8 in C#. I've already try many ways but none works as I wanted. I converted my string into a byte array and then to try to write it to an XML file (which encoding is UTF-8....) but either I got the same string (not encoded at all) either I got a list of byte which is useless.... Does someone face the same issue ?

编辑:
这是一些代码我使用:

Edit : This is some of the code I used :

str= "testé";
byte[] utf8Bytes = Encoding.UTF8.GetBytes(str);
return Encoding.UTF8.GetString(utf8Bytes);

结果是testé,或者我预期会像testé...

The result is "testé" or I expected something like "testé"...

如果你想要一个UTF8字符串,其中每个字节都是正确的('Ö' - > [195,0],[150,0 ]),您可以使用以下命令:

If you want a UTF8 string, where every byte is correct ('Ö' -> [195, 0] , [150, 0]), you can use the followed:

public static string Utf16ToUtf8(string utf16String)
{
   /**************************************************************
    * Every .NET string will store text with the UTF16 encoding, *
    * known as Encoding.Unicode. Other encodings may exist as    *
    * Byte-Array or incorrectly stored with the UTF16 encoding.  *
    *                                                            *
    * UTF8 = 1 bytes per char                                    *
    *    ["100" for the ansi 'd']                                *
    *    ["206" and "186" for the russian 'κ']                   *
    *                                                            *
    * UTF16 = 2 bytes per char                                   *
    *    ["100, 0" for the ansi 'd']                             *
    *    ["186, 3" for the russian 'κ']                          *
    *                                                            *
    * UTF8 inside UTF16                                          *
    *    ["100, 0" for the ansi 'd']                             *
    *    ["206, 0" and "186, 0" for the russian 'κ']             *
    *                                                            *
    * We can use the convert encoding function to convert an     *
    * UTF16 Byte-Array to an UTF8 Byte-Array. When we use UTF8   *
    * encoding to string method now, we will get a UTF16 string. *
    *                                                            *
    * So we imitate UTF16 by filling the second byte of a char   *
    * with a 0 byte (binary 0) while creating the string.        *
    **************************************************************/

    // Storage for the UTF8 string
    string utf8String = String.Empty;

    // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
    byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
    byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);

    // Fill UTF8 bytes inside UTF8 string
    for (int i = 0; i < utf8Bytes.Length; i++)
    {
        // Because char always saves 2 bytes, fill char with 0
        byte[] utf8Container = new byte[2] { utf8Bytes[i], 0 };
        utf8String += BitConverter.ToChar(utf8Container, 0);
    }

    // Return UTF8
    return utf8String;
}

在我的情况下,DLL请求也是一个UTF8字符串,但遗憾的是UTF8字符串必须用UTF16编码('Ö' - > [195,0],[19,32])进行解释。所以ANSI' - '是150,必须转换为UTF16 - ,即8211.如果你也有这种情况,你可以使用以下代码:

In my case the DLL request is a UTF8 string too, but unfortunately the UTF8 string must be interpreted with UTF16 encoding ('Ö' -> [195, 0], [19, 32]). So the ANSI '–' which is 150 has to be converted to the UTF16 '–' which is 8211. If you have this case too, you can use the following instead:

public static string Utf16ToUtf8(string utf16String)
{
    // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
    byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
    byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);

    // Return UTF8 bytes as ANSI string
    return Encoding.Default.GetString(utf8Bytes);
}

或本机方法:

[DllImport("kernel32.dll")]
private static extern Int32 WideCharToMultiByte(UInt32 CodePage, UInt32 dwFlags, [MarshalAs(UnmanagedType.LPWStr)] String lpWideCharStr, Int32 cchWideChar, [Out, MarshalAs(UnmanagedType.LPStr)] StringBuilder lpMultiByteStr, Int32 cbMultiByte, IntPtr lpDefaultChar, IntPtr lpUsedDefaultChar);

public static string Utf16ToUtf8(string utf16String)
{
    Int32 iNewDataLen = WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, utf16String.Length, null, 0, IntPtr.Zero, IntPtr.Zero);
    if (iNewDataLen > 1)
    {
        StringBuilder utf8String = new StringBuilder(iNewDataLen);
        WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, -1, utf8String, utf8String.Capacity, IntPtr.Zero, IntPtr.Zero);

        return utf8String.ToString();
    }
    else
    {
        return String.Empty;
    }
}

如果您需要其他方法,请参阅 Utf8ToUtf16 。
希望我能帮忙。

If you need it the other way around, see Utf8ToUtf16. Hope I could be of help.