如何在 ANSI C 中打印重音字符(如 á é í ó ú)

如何在 ANSI C 中打印重音字符(如 á é í ó ú)

问题描述:

我尝试使用一些重音字符来printf,例如á é í ó ú:

I tried to printf with some accented characters such as á é í ó ú:

printf("我叫肖恩 ");

printf("my name is Seán ");

DEVC++ IDE 中的文本编辑器可以很好地显示它们 - 即源代码看起来很好.我想我需要一些除了 stdio.h 之外的库,也许还有一些普通 printf 的变体.

The text editor in the DEVC++ IDE displays them fine - i.e the source code looks fine. I guess I need some library other than stdio.h and maybe some variant of the normal printf.

我使用的是在 Windows XP 上运行的 IDE Bloodshed DEVC.

I'm using IDE Bloodshed DEVC running on Windows XP.

也许最好使用 Unicode.

Perhaps the best is to use Unicode.

这是如何...

首先,手动将您的控制台字体设置为Consolas"或Lucida Console"或您可以选择的任何 True-Type Unicode 字体(光栅字体"可能不起作用,那些不是 Unicode 字体,尽管它们可能包含字符你感兴趣).

First, manually set your console font to "Consolas" or "Lucida Console" or whichever True-Type Unicode font you can choose ("Raster fonts" may not work, those aren't Unicode fonts, although they may include characters you're interested in).

接下来,使用 SetConsoleOutputCP(CP_UTF8) 将控制台代码页设置为 65001 (UTF-8).

Next, set the console code page to 65001 (UTF-8) with SetConsoleOutputCP(CP_UTF8).

然后使用 WideCharToMultiByte(CP_UTF8, ...) 将您的文本转换为 UTF-8(如果它还不是 UTF-8).

Then convert your text to UTF-8 (if it's not yet in UTF-8) using WideCharToMultiByte(CP_UTF8, ...).

最后,调用 WriteConsoleA() 输出 UTF-8 文本.

Finally, call WriteConsoleA() to output the UTF-8 text.

这里有一个小函数可以为您完成所有这些事情,它是 wprintf() 的改进"变体:

Here's a little function that does all these things for you, it's an "improved" variant of wprintf():

int _wprintf(const wchar_t* format, ...)
{
  int r;
  static int utf8ModeSet = 0;
  static wchar_t* bufWchar = NULL;
  static size_t bufWcharCount = 256;
  static char* bufMchar = NULL;
  static size_t bufMcharCount = 256;
  va_list vl;
  int mcharCount = 0;

  if (utf8ModeSet == 0)
  {
    if (!SetConsoleOutputCP(CP_UTF8))
    {
      DWORD err = GetLastError();
      fprintf(stderr, "SetConsoleOutputCP(CP_UTF8) failed with error 0x%X
", err);
      utf8ModeSet = -1;
    }
    else
    {
      utf8ModeSet = 1;
    }
  }

  if (utf8ModeSet != 1)
  {
    va_start(vl, format);
    r = vwprintf(format, vl);
    va_end(vl);
    return r;
  }

  if (bufWchar == NULL)
  {
    if ((bufWchar = malloc(bufWcharCount * sizeof(wchar_t))) == NULL)
    {
      return -1;
    }
  }

  for (;;)
  {
    va_start(vl, format);
    r = vswprintf(bufWchar, bufWcharCount, format, vl);
    va_end(vl);

    if (r < 0)
    {
      break;
    }

    if (r + 2 <= bufWcharCount)
    {
      break;
    }

    free(bufWchar);
    if ((bufWchar = malloc(bufWcharCount * sizeof(wchar_t) * 2)) == NULL)
    {
      return -1;
    }
    bufWcharCount *= 2;
  }

  if (r > 0)
  {
    if (bufMchar == NULL)
    {
      if ((bufMchar = malloc(bufMcharCount)) == NULL)
      {
        return -1;
      }
    }

    for (;;)
    {
      mcharCount = WideCharToMultiByte(CP_UTF8,
                                       0,
                                       bufWchar,
                                       -1,
                                       bufMchar,
                                       bufMcharCount,
                                       NULL,
                                       NULL);
      if (mcharCount > 0)
      {
        break;
      }

      if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
      {
        return -1;
      }

      free(bufMchar);
      if ((bufMchar = malloc(bufMcharCount * 2)) == NULL)
      {
        return -1;
      }
      bufMcharCount *= 2;
    }
  }

  if (mcharCount > 1)
  {
    DWORD numberOfCharsWritten, consoleMode;

    if (GetConsoleMode(GetStdHandle(STD_OUTPUT_HANDLE), &consoleMode))
    {
      fflush(stdout);
      if (!WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),
                         bufMchar,
                         mcharCount - 1,
                         &numberOfCharsWritten,
                         NULL))
      {
        return -1;
      }
    }
    else
    {
      if (fputs(bufMchar, stdout) == EOF)
      {
        return -1;
      }
    }
  }

  return r;
}

以下测试此功能:

_wprintf(L"xA0xA1xA2xA3xA4xA5xA6xA7"
         L"xA8xA9xAAxABxACxADxAExAF"
         L"xB0xB1xB2xB3xB4xB5xB6xB7"
         L"xB8xB9xBAxBBxBCxBDxBExBF"
         L"
"
         L"xC0xC1xC2xC3xC4xC5xC6xC7"
         L"xC8xC9xCAxCBxCCxCDxCExCF"
         L"xD0xD1xD2xD3xD4xD5xD6xD7"
         L"xD8xD9xDAxDBxDCxDDxDExDF"
         L"
"
         L"xE0xE1xE2xE3xE4xE5xE6xE7"
         L"xE8xE9xEAxEBxECxEDxEExEF"
         L"xF0xF1xF2xF3xF4xF5xF6xF7"
         L"xF8xF9xFAxFBxFCxFDxFExFF"
         L"
");

_wprintf(L"x391x392x393x394x395x396x397"
         L"x398x399x39Ax39Bx39Cx39Dx39Ex39F"
         L"x3A0x3A1x3A2x3A3x3A4x3A5x3A6x3A7"
         L"x3A8x3A9x3AAx3ABx3ACx3ADx3AEx3AFx3B0"
         L"
"
         L"x3B1x3B2x3B3x3B4x3B5x3B6x3B7"
         L"x3B8x3B9x3BAx3BBx3BCx3BDx3BEx3BF"
         L"x3C0x3C1x3C2x3C3x3C4x3C5x3C6x3C7"
         L"x3C8x3C9x3CAx3CBx3CCx3CDx3CE"
         L"
");

_wprintf(L"x410x411x412x413x414x415x401x416x417"
         L"x418x419x41Ax41Bx41Cx41Dx41Ex41F"
         L"x420x421x422x423x424x425x426x427"
         L"x428x429x42Ax42Bx42Cx42Dx42Ex42F"
         L"
"
         L"x430x431x432x433x434x435x451x436x437"
         L"x438x439x43Ax43Bx43Cx43Dx43Ex43F"
         L"x440x441x442x443x444x445x446x447"
         L"x448x449x44Ax44Bx44Cx44Dx44Ex44F"
         L"
");

并且应该在控制台中产生以下文本:

And should result in the following text in the console:

 ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß
àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ΢ΣΤΥΦΧΨΩΪΫάέήίΰ
αβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ
АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
абвгдеёжзийклмнопрстуфхцчшщъыьэюя

我不知道你的 IDE 在 .c/.cpp 文件中存储非 ASCII 字符的编码,我不知道你的编译器在遇到非 ASCII 字符时会做什么.这部分你应该自己弄清楚.

I do not know the encoding in which your IDE stores non-ASCII characters in .c/.cpp files and I do not know what your compiler does when encounters non-ASCII characters. This part you should figure out yourself.

只要您向 _wprintf() 提供正确编码的 UTF-16 文本或使用正确编码的 UTF-8 文本调用 WriteConsoleA(),事情应该可以正常工作.

As long as you supply to _wprintf() properly encoded UTF-16 text or call WriteConsoleA() with properly encoded UTF-8 text, things should work.

附言可以找到有关控制台字体的一些血腥细节 这里.

P.S. Some gory details about console fonts can be found here.