如何修复双重编码的UTF8字符(在utf-8表中)

问题描述:

上一个 LOAD DATA INFILE 是在假设CSV文件 latin1 -encoded的情况下运行的。在此导入过程中,多字节字符被解释为两个单字符,然后再次使用utf-8编码。

A previous LOAD DATA INFILE was run under the assumption that the CSV file is latin1-encoded. During this import the multibyte characters were interpreted as two single character and then encoded using utf-8 (again).

这种双重编码创建了异常,如

This double-encoding created anomalies like ñ instead of ñ.

如何更正这些字符串?

以下MySQL函数将在双重编码后返回正确的utf8字符串:

The following MySQL function will return the correct utf8 string after double-encoding:

CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)

它可以与 UPDATE 语句一起使用以更正字段:

It can be used with an UPDATE statement to correct the fields:

UPDATE tablename SET
    field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);