[转]Passing Managed Structures With Strings To Unmanaged Code Part 1

[转]Passing Managed Structures With Strings To Unmanaged Code Part 1

1. Introduction.

1.1 Managed structures that contain strings are a common sight. The trouble is that managed strings are non-blittable. This means that they do not have a single common representation in the unmanaged world. However, there are several standardized representations that are recognized by the CLR.

1.2 A managed string is also a referenced type which means that it cannot be a directly embedded as part of the memory block of its containing structure. Contrast this with a value type (e.g. integer) which can be directly part of a containing structure.

1.3 As such, a managed structure which contains string members cannot be directly passed to an unmanaged function. It needs to be transformed into an unmanaged representation that can be used by unmanaged code. This transformation is what makes a managed string accessible and consumable by unmanaged code. It is also part of the interop marshaling process and is the topic of this series of blogs.

2. Unmanaged String Representations.

2.1 Unmanaged strings can be represented in many different ways. Each unmanaged language can have its own internal representation.

2.2 The C/C++ language has its own rather famous NULL-terminated character array. A Pascal string intrinsically contains its length as a prefix (similar to a BSTR). Other languages may have their own internal representations. A string need not even be internally organized as an array.

2.3 Because of the diversity of string representations in the unmanaged world, the Interop Marshaler can only standardize on a few to represent a string contained within an unmanaged structure. These are :

  • A fixed length inline NULL-terminated character array (aka C-style string).
  • A pointer to a NULL-terminated character array (better known as LPCSTR in C/C++ lingo).
  • A BSTR.

2.4 The selected representation is indicated by using the MarshalAsAttribute. In section 4 below, we shall discuss how to represent a managed string (contained within a structure) as a fixed-length character array inside an unmanaged structure. In section 5, we will explore how to represent the string as a pointer to a NULL-terminated string. And finally in section 6, we will explore BSTRs.

2.5 Throughout this part 1, we will focus only on passing a structure from managed code to unmanaged code by value. In further parts of this series, we will study how such a structure can be passed from unmanaged code to managed code as “out” (return) parameters. In yet further parts, we will discuss how such a structure may be passed two-ways (as “in” and “out” parameters).

2.6 Throughout this series of blogs, the unmanaged code shall be based on C++.

3. Sample Structure.

3.1 Throughout this part 1, we shall use the following managed structure for illustrative purposes :

public struct TestStruct01
{
  public string m_strString;
};

3.2 Now, in order that this structure be marshalable to unmanaged code, a minimum amount of specification must be afforded to the interop marshaler in the form of the StructLayoutAttribute. This attribute indicates the general memory layout of the structure when represented in unmanaged code. The following is a typical way this structure is put to use :

[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct TestStruct01
{
  public string m_strString;
};

3.3 The “LayoutKind” enumeration controls the memory layout of an object when exported to unmanaged code. The Sequential value indicates that the members of the structure are laid out sequentially, in the order in which they appear when exported to unmanaged memory.

3.4 The “Pack” argument indicates the byte packing between the struct members. I have used a value of 1 for simplicity.

3.5 The “CharSet” argument indicates the character set that the string member is to map to in unmanaged code. Here, we have specified “Ansi” which means that the “m_strString” string member, when marshaled to unmanaged code, will contain Ansi characters. Note that managed strings always contain Unicode characters internally.

4. Representing a String as an Inline NULL-Terminated Character Array.

4.1 A string member of a managed structure, when to be marshaled across as an inline NULL-terminated character array, must be declared with the following MarshalAsAttribute :

[MarshalAs(UnmanagedType.ByValTStr, SizeConst = <size of character array>)]

The “UnmanagedType.ByValTStr” argument indicates that the string member will be marshaled as a “by value array” of characters the size of which is indicated by the “SizeConst” argument. The “SizeConst” value includes the terminating NULL-character for the string.

Note well the description “by value array” (contrast this to “by reference array”). This means that the string will be marshaled as an inline array of characters which are embedded within the containing structure itself.

4.2 The following is an example structure :

[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct TestStruct01
{
  [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 21)]
  public string m_strString;
};

The “SizeConst” argument is set to a value of 21 as an example. Hence the “m_strString” string member must contain at most 20 characters. The 21st character will be the terminating NULL.

4.3 The corresponding C++ equivalent of such a structure is :

#pragma pack (1)

struct TestStruct01
{
  char m_szString[21];
};

4.4 An example C++-based unmanaged API that takes such a structure as an “in” value parameter is listed below :

void __stdcall DisplayTestStruct01(/*[in]*/ TestStruct01 test_struct_01)
{
  printf("test_struct_01.m_szString : [%s].
", test_struct_01.m_szString);
}

The string contents of the “m_szString” member of the C++-based TestStruct01 structure is displayed.

4.5 The following is how such an API is declared in C# :

[DllImport("TestDLL01.dll", CallingConvention = CallingConvention.StdCall)]
public static extern void DisplayTestStruct01([In] TestStruct01 test_struct_01);

4.6 And a sample C# method of using the TestStruct01 structure with a call to the DisplayTestStruct01() API :

static void DisplayTestStruct01()
{
  TestStruct01 test_struct_01;

  test_struct_01.m_strString = "Hello World";

  DisplayTestStruct01(test_struct_01);
}

Of course, “test_struct_01.m_strString” must be set to a string value that does not exceed 20 characters in length. If so the string will be truncated for the sake of the final terminating NULL character.

4.7 Low-level wise, where does the unmanaged “m_szString” struct member reside ? Now, since the DisplayTestStruct01() API takes an “in” (read-only) TestStruct01 structure, the entire unmanaged structure (which includes the m_szString inline character array) can simply be pushed onto the call stack.

4.8 Hence the interop marshaler uses the stack space to create the unmanaged TestStruct01 structure and copies the contents of the “m_szString” member of the managed “test_struct_01” into the corresponding member of the unmanaged struct on the stack.

4.9 The use of the stack to store the entire unmanaged structure is most ideal in this situation because the structure is read-only (hence the interop marshaler need not concern itself about changes) and because when the API returns, the stack space is automatically recovered, saving the interop marshaler the trouble of any memory release.

5. Representing a String as a Pointer to an Unmanaged NULL-Terminated Character Array.

5.1 A string member of a managed structure, when to be marshaled across as a pointer to an unmanaged NULL-terminated character array, must be declared with the following MarshalAsAttribute :

[MarshalAs(UnmanagedType.LPStr)]

This indicates that the string member is to be marshaled across as a pointer to a NULL-terminated ANSI character array.

5.2 The following is an example structure :

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct TestStruct02
{
  [MarshalAs(UnmanagedType.LPStr)]
  public string m_strString;
};

Notice that TestStruct02 is very similar to TestStruct01 except for the MarshalAsAttribute applied to the “m_strString” member.

5.3 The corresponding C++ equivalent of such a structure is :

#pragma pack (1)

struct TestStruct02
{
  LPCSTR m_pszString;
};

5.4 An example C++-based unmanaged API that takes such a structure as an “in” value parameter is listed below :

void __stdcall DisplayTestStruct02(/*[in]*/ TestStruct02 test_struct_02)
{
  printf("test_struct_02.m_pszString : [%s].
", test_struct_02.m_pszString);
}

The string contents of the buffer pointed to by the “m_pszString” member of the C++-based TestStruct02 structure is displayed.

5.5 The following is how such an API is declared in C# :

[DllImport("TestDLL01.dll", CallingConvention = CallingConvention.StdCall)]
public static extern void DisplayTestStruct02([In] TestStruct02 test_struct_02);

5.6 And a sample C# method of using the TestStruct02 structure with a call to the DisplayTestStruct02() API :

static void DisplayTestStruct02()
{
  TestStruct02 test_struct_02;

  test_struct_02.m_strString = "Hello World";

  DisplayTestStruct02(test_struct_02);
}

5.7 This time, the length of the string assignable to “m_strString” is not limited to any fixed length (viz the TestStruct01 structure).

5.8 Under the covers, the “test_struct_02” itself will be pushed onto the stack just like the previous case with the TestStruct01 structure and the call to the DisplayTestStruct01() API. The “m_pszString” member, however, is a pointer to a NULL-terminated C-style string. It will not be an inline member of this structure. What will happen is that the Interop Marshaler will allocate a memory buffer that will be used to store the unmanaged string which is based on the value in “test_struct_02.m_strString”. It is a pointer to this buffer that will be passed as the unmanaged “test_struct_02.m_pszString” that we saw in DisplayTestStruct02() of point 5.4. Then when the API returns, the interop marshaler will free this buffer.

5.9 The memory allocation is done via the CoTaskMemAlloc() API. The interop marshaler may call this API directly or it may use the static Marshal.AllocCoTaskMem() method which will eventually call CoTaskMemAlloc(). The memory freeing is done CoTaskMemFree(). The interop marshaler may call it directly or use the Marshal.FreeCoTaskMem() static method which will eventually call CoTaskMemFree().

5.10 As an alternative to using “UnmanagedType.LPStr”, we can also use “UnmanagedType.LPWStr”. In this case, the C# string would be marshaled across as a wide-character string (i.e. Unicode). The C# declaration for the TestStruct02 structure would be :

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct TestStruct02
{
  [MarshalAs(UnmanagedType.LPWStr)]
  public string m_strString;
};

The C++ declaration for TestStruct02 would be :

#pragma pack (1)

struct TestStruct02
{
  LPCWSTR m_pszString;
};

The C++ API DisplayTestStruct02() would be :

void __stdcall DisplayTestStruct02(/*[in]*/ TestStruct02 test_struct_02)
{
  printf("test_struct_02.m_pszString : [%S].
", test_struct_02.m_pszString);
}

The C# DisplayTestStruct02() method that calls the DisplayTestStruct02() API is the same.

6. Representing a String as a BSTR.

6.1 A string member of a managed structure, when to be marshaled across as a BSTR, must be declared with the following MarshalAsAttribute :

[MarshalAs(UnmanagedType.BStr)]

This indicates that the string member is to be marshaled across as a COM BSTR.

6.2 The following is an example structure :

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct TestStruct03
{
  [MarshalAs(UnmanagedType.BStr)]
  public string m_strString;
};

6.3 The corresponding C++ equivalent of such a structure is :

#pragma pack (1)

struct TestStruct03
{
  BSTR m_bstr;
};

6.4 An example C++-based unmanaged API that takes such a structure as an “in” value parameter is listed below :

void __stdcall DisplayTestStruct03(/*[in]*/ TestStruct03 test_struct_03)
{
  printf("test_struct_03.m_bstr : [%S].
", test_struct_03.m_bstr);
}

This API simply prints out the “m_bstr” member of the input “test_struct_03” struct.

6.5 The following is how such an API is declared in C# :

[DllImport("TestDLL01.dll", CallingConvention = CallingConvention.StdCall)]
public static extern void DisplayTestStruct03([In] TestStruct03 test_struct_03);

6.6 And a sample C# method of using the TestStruct03 structure with a call to the DisplayTestStruct03() API :

static void DisplayTestStruct03()
{
  TestStruct03 test_struct_03;

  test_struct_03.m_strString = "Hello World";

  DisplayTestStruct03(test_struct_03);
}

6.7 Just like the “m_strString” member of the “TestStruct02” structure, the length of the string assignable to the “m_strString” member of the “TestStruct03” structure is not limited to any fixed length.

6.8 Under the covers, the unmanaged version of “test_struct_03” will be pused onto the stack just like the previous 2 examples. The Interop Marshaler will internally allocate a BSTR (via SysAllocString()) that will be used to store the unmanaged string which is based on the value in “test_struct_03.m_strString”. It is this BSTR that will be passed as the unmanaged “test_struct_03.m_bstr” that we saw in DisplayTestStruct03() of point 6.4. Then when the API returns, the interop marshaler will free this BSTR (via SysFreeString()).

7. Special Handling for Structures Passed by Pointers.

7.1 If a structure (containing a string field member) is to be passed to an unmanaged API as a pointer, special processing (similar to manual marshaling) needs to be done.

7.2 Let’s say we need to pass a pointer to TestStruct01 as a parameter to an API :

void __stdcall DisplayTestStruct01_ViaPointer(/*[in]*/ TestStruct01* ptest_struct_01)
{
  printf("ptest_struct_01 -> m_szString : [%s].
", ptest_struct_01 -> m_szString);
}

7.3 The above API will need to be declared in C# as :

[DllImport(@"TestDLL01.dll", CallingConvention = CallingConvention.StdCall)]
public static extern void DisplayTestStruct01_ViaPointer([In] IntPtr ptest_struct_01);

7.4 This time, the structure will need to be passed as an IntPtr. Now, because there is no MarshalAsAttributes attached to this IntPtr parameter, the interop marshaler does not know how to perform automatic marshaling. Hence all marshaling will need to be performed manually. The following will need to be done :

  • A managed TestStruct01 structure will need to be allocated and its string field assigned as usual.
  • The size of TestStruct01 in unmanaged memory will need to be calculated.
  • A block of unmanaged memory the size of the unmanaged version of TestStruct01 will have to be allocated.
  • The field values of the managed TestStruct01 will have to be transferred (i.e. marshaled) to the unmanaged memory block which now serves as the unmanaged representation of TestStruct01.
  • After this, the unmanaged API is be invoked with a pointer to the unmanaged TestStruct01.
  • After the API call, the unmanaged structure will need to be freed. This is where careful attention must be taken. There are 2 steps to be taken in sequence.
  • First, the field values of the structure which are references to memory (e.g. a LPSTR or a BSTR) must have their referenced memory freed.
  • Next, the unmanaged structure itself must be freed.
  • Because the parameter was passed as an “in” parameter, it was passed as a read-only parameter. The unmanaged API is not allowed to modify its memory. The managed application is the rightful owner of the memory of the unmanaged representation of the TestStruct01 structure and so the deallocation of the memory for the unmanaged TestStruct01 must be done within the managed application code.

7.5 The C# code below demonstrates this in detail :

static void DisplayTestStruct01_ViaPointer()
{
  // Create TestStruct01 and assign value to its field as usual.
  TestStruct01 test_struct_01 = new TestStruct01();

  test_struct_01.m_strString = "Hello World";

  // Determine the size of the TestStruct01 for marshaling.
  int iSizeOfTestStruct01 = Marshal.SizeOf(typeof(TestStruct01));
  // Allocate in unmanaged memory a block of memory the size
  // of an unmanaged TestStruct01 structure.
  IntPtr ptest_struct_01 = Marshal.AllocHGlobal(iSizeOfTestStruct01);
  // Transfer the contents of the managed TestStruct01
  // (i.e. test_struct_01) to the unmanaged memory
  // which now serves as the unmanaged representation
  // of test_struct_01.
  Marshal.StructureToPtr(test_struct_01, ptest_struct_01, false);
  // Call the API using a pointer to the unmanaged test_struct_01.
  DisplayTestStruct01_ViaPointer(ptest_struct_01);
  // We must remember to destroy the test_struct_01 structure.
  // Doing this will free any fields which are references
  // to memory.
  Marshal.DestroyStructure(ptest_struct_01, typeof(TestStruct01));
  // Finally, the block of memory allocated for the
  // unmanaged test_struct_01 must itself be freed.
  Marshal.FreeHGlobal(ptest_struct_01);
  ptest_struct_01 = IntPtr.Zero;
}

7.6 The following are the pertinent points concerning the code above :

  • The Marshal.SizeOf() method is used to calculate the byte size of the unmanaged TestStruct01 structure. This is done with the help of the StructLayoutAttribute and the MarshalAsAttributes decorated on the structure.
  • The Marshal.AllocHGlobal() method is used to perform the actual memory allocation for the unmanaged structure. Note that Marshal.AllocCoTaskMem() may also be used.
  • Next comes the important step of transferring the contents of the managed TestStruct01 to the unmanaged memory which now serves as the unmanaged representation of test_struct_01. This is done by using Marshal.StructureToPtr().
  • The above step is crucial because recall that we want to pass TestStruct01 to the unmanaged API to have its field contents displayed. Hence the unmanaged structure must contain values equivalent to those of the managed structure.
  • The API DisplayTestStruct01_ViaPointer() is then called with the pointer to the unmanage structure passed as parameter.
  • After the API is called, nothing further needs to be done to the managed TestStruct01 structure. It has done its part.
  • However the unmanaged structure still remains in memory and we must remember to destroy it. In this destruction process, we must also free any fields which are references to memory. This is done using Marshal.DestroyStructure().
  • Now the unmanaged TestStruct01 structure contains an inline embedded character array which is not a reference to memory. Calling Marshal.DestroyStructure() will not destroy this inline array. The call is harmless nevertheless.
  • Next we must free the unmanaged TestStruct01 structure itself. This is done by calling Marshal.FreeHGlobal on the IntPtr.

7.7 Note my remark that after the call to DisplayTestStruct01_ViaPointer(), nothing further needs to be done to the managed TestStruct01 structure. This is because the spirit and intent of the DisplayTestStruct01_ViaPointer() API is to take a pointer to the unmanaged TestStruct01 structure as an “in” parameter.

This means that we do not expect any changes to be done to the structure, i.e. changes that will need to be reflected in the managed version of the same structure. Later, when we study the passing of the structure as an “out” or an “in and out” parameter, extra work will need to be done to ensure that changes made to the unmanaged structure are transferred to the managed version. More on this in part 2 and 3.

7.8 The very useful function used in the code in point 7.5 is Marshal.DestroyStructure(). This method will thoroughly free fields which are references to memory (e.g. a string).

In order to use Marshal.DestroyStructure() effectively, the structure must be properly decorated with MarshalAsAttributes for fields which are reference types. In the above example, Marshal.DestroyStructure() is used to destroy a pointer to an unmanaged TestStruct01 structure.

7.9 Now the unmanaged TestStruct01 contains an embedded character array. It will not be touched by Marshal.DestroyStructure(). However, genuine reference type fields, e.g. those declared to be marshaled as a pointer to a NULL-terminated character array (as is the case for TestStruct02) :

[MarshalAs(UnmanagedType.LPStr)]
public string m_strString;

or a BSTR (as is the case for TestStruct03) :

[MarshalAs(UnmanagedType.BStr)]
public string m_strString;

will be handled correctly by Marshal.DestroyStructure().

7.10 In fact, the code in 7.5 can be modified to handle TestStruct02 and TestStruct03. If we were working with TestStruct02 (the string field of which is marshaled as a pointer to a NULL-terminated C-style string) when the m_strString field is marshaled across to its unmanaged counterpart, the interop marshaler will allocate memory for the NULL-terminated character array string and then transfer the string value in the m_strString field to this array via Marshal.StringToCoTaskMemAnsi(). This is done during a call to Marshal.StructureToPtr().

Then, when Marshal.DestroyStructure() is called, Marshal.FreeCoTaskMem() (which eventually calls ::CoTaskMemFree()) is called to free the memory occuppied by the NULL-terminated character array string.

7.11 If it was TestStruct03 that was passed as a pointer to the unmanaged API, then ::SysAllocStringLen() will be used to allocated the BSTR field when Marshal.StructureToPtr() is called. And ::SysFreeString() will be used to free the BSTR when Marshal.DestroyStructure() is called.

7.12 I have created the following set of code constructs for the reader to test with TestStruct02 and TestStruct03 :

void __stdcall DisplayTestStruct01_ViaPointer(/*[in]*/ TestStruct01* ptest_struct_01)
{
  printf("ptest_struct_01 -> m_szString : [%s].
", ptest_struct_01 -> m_szString);
}

void __stdcall DisplayTestStruct02_ViaPointer(/*[in]*/ TestStruct02* ptest_struct_02)
{
  printf("ptest_struct_02 -> m_pszString : [%s].
", ptest_struct_02 -> m_pszString);
}

void __stdcall DisplayTestStruct03_ViaPointer(/*[in]*/ TestStruct03* ptest_struct_03)
{
  printf("ptest_struct_03 -> m_bstr : [%S].
", ptest_struct_03 -> m_bstr);
}

Listed above are the C++ definitions for the 3 display APIs each of which takes a pointer to one of the versions of the test structs.

Listed below are various C# code constructs :

[DllImport("TestDLL01.dll", CallingConvention = CallingConvention.StdCall)]
public static extern void DisplayTestStruct01_ViaPointer([In] IntPtr ptest_struct_01);

[DllImport("TestDLL01.dll", CallingConvention = CallingConvention.StdCall)]
public static extern void DisplayTestStruct02_ViaPointer([In] IntPtr ptest_struct_02);

[DllImport("TestDLL01.dll", CallingConvention = CallingConvention.StdCall)]
public static extern void DisplayTestStruct03_ViaPointer([In] IntPtr ptest_struct_03);

static void InitializeStruct(ref TestStruct01 test_struct_01, string strInitial)
{
  test_struct_01.m_strString = strInitial;
}

static void InitializeStruct(ref TestStruct02 test_struct_02, string strInitial)
{
  test_struct_02.m_strString = strInitial;
}

static void InitializeStruct(ref TestStruct03 test_struct_03, string strInitial)
{
  test_struct_03.m_strString = strInitial;
}

delegate void Delegate_InitializeStruct(ref T t, string strInitial);

delegate void Delegate_DisplayTestStruct0x_ViaPointer(IntPtr ptest_struct_0x);

static void DisplayTestStruct0x_ViaPointer
    (
      Delegate_InitializeStruct pInitializer,
      Delegate_DisplayTestStruct0x_ViaPointer pDisplayFunction
    ) where T : new()
{
    // Create structure T and assign value to its field as usual.
    T test_struct_0x = new T();

    pInitializer(ref test_struct_0x, "Hello World");

    // Determine the size of the T structure for marshaling.
    int iSizeOfTestStruct0x = Marshal.SizeOf(typeof(T));
    // Allocate in unmanaged memory a block of memory the size
    // of an unmanaged T structure.
    IntPtr ptest_struct_0x = Marshal.AllocHGlobal(iSizeOfTestStruct0x);
    // Transfer the contents of the managed T structure
    // (i.e. test_struct_0x) to the unmanaged memory
    // which now serves as the unmanaged representation
    // of test_struct_0x.
    Marshal.StructureToPtr(test_struct_0x, ptest_struct_0x, false);
    // Call the API using a pointer to the unmanaged test_struct_0x.
    pDisplayFunction(ptest_struct_0x);
    // We must remember to destroy the test_struct_0x structure.
    // Doing this will free any fields which are references
    // to memory.
    Marshal.DestroyStructure(ptest_struct_0x, typeof(T));
    // Finally, the block of memory allocated for the
    // unmanaged test_struct_0x must itself be freed.
    Marshal.FreeHGlobal(ptest_struct_0x);
    ptest_struct_0x = IntPtr.Zero;
}

static void Main(string[] args)
{
  Delegate_InitializeStruct pInitializerFunction = InitializeStruct;
  Delegate_DisplayTestStruct0x_ViaPointer pDisplayFunction = DisplayTestStruct01_ViaPointer;

  DisplayTestStruct0x_ViaPointer(pInitializerFunction, pDisplayFunction);
}

The above code provides the following :

  • C# declarations for the DisplayTestStruct01_ViaPointer(), DisplayTestStruct02_ViaPointer() and DisplayTestStruct03_ViaPointer() APIs.
  • 3 overloaded initializer functions (InitializeStruct()), one each for the structs TestStruct01, TestStruct02 and TestStruct03.
  • Delegate Delegate_InitializeStruct<T> which is meant to generically point to one of the initializer functions.
  • Delegate_DisplayTestStruct0x_ViaPointer which is meant to point to one of the versions of the DisplayTestStructXX_ViaPointer APIs.
  • DisplayTestStruct0x_ViaPointer<T>() which generically performs the equivalent of the DisplayTestStruct01_ViaPointer() function (listed in point 7.5) but for all the test structs.
  • The main() function shows how DisplayTestStruct0x_ViaPointer<T>() may be called for TestStruct01. By simple substitutions, we can call DisplayTestStruct0x_ViaPointer<T>() for TestStruct02 and TestStruct03.

7.13 For more details on passing a pointer to a structure from managed to unmanaged code, please refer to :

Passing a Pointer to a Structure from C# to C++ Part 1.

Passing a Pointer to a Structure from C# to C++ Part 2.

Passing a Pointer to a Structure from C# to C++ Part 3.

8. In Conclusion.

8.1 In this part 1, we have explored how to express a string member of a managed struct in unmanaged code. We have passed such a structure as an “in” (by-value) parameter.

8.2 In part 2, we will look into how to pass such a structure as an “out” (return) parameter.