PHP:使用utf8_encode时在csv中错误编码的字符

PHP:使用utf8_encode时在csv中错误编码的字符

问题描述:

I am facing a strange issue when extracting data from a MySql database and inserting it in a CSV file. In the database, the field value is the following:

K Secure Connection 1 año 1 PC

When I echo it before writing it to the CSV file, I get the same as the above in my terminal.

I use the following code to write content to the CSV file:

fwrite($this->fileHandle, utf8_encode($lineContent . PHP_EOL));

Yet, when I open the CSV with LibreOffice Calc (and specify UTF-8 as the encoding format), the following is displayed:

K Secure Connection 1 año 1 PC

I have no idea why this happens. Can someone explain how to solve this?

REM:

SELECT @@character_set_database;

returns

latin1 

REM 2:

`var_dump($lineContent, bin2hex($lineContent))`

gives

string(39) "Kaspersky Secure Connection 1 año 1 PC"
string(78) "4b6173706572736b792053656375726520436f6e6e656374696f6e20312061c3b16f2031205043"

从MySql数据库中提取数据并将其插入CSV文件时,我遇到了一个奇怪的问题。 在数据库中,字段值如下: p>

  K安全连接1año1PC 
  code>  pre> 
 
 

在将其写入CSV文件之前,我 echo code>,我在终端中得到与上面相同的内容。 p>

我使用以下代码将内容写入 CSV文件: p>

  fwrite($ this-> fileHandle,utf8_encode($ lineContent.PHP_EOL)); 
  code>  pre> 
 
 然而,当我用LibreOffice Calc打开CSV(并指定UTF-8作为编码格式)时,会显示以下内容: p> 
 
 
  K Secure Connection1año  1 PC 
  code>  pre> 
 
 

我不知道为什么会这样。 有人可以解释如何解决这个问题吗? p>

REM: strong> p>

  SELECT @@ character_set_database; 
   pre> 
 
 

返回 p>

  latin1 
  code>  pre> 
 
 

REM 2: strong> p>

 `var_dump($ lineContent,bin2hex($ lineContent))`
  code>  pre> 
 
 

give p>

  string(39)“Kaspersky Secure Connection 1año1PC”
string(78)“4b6173706572736b792053656375726520436f6e6e656374696f6e20312061c3b16f2031205043”
  code>  pre> 
 

The var_dump shows that the string is already encoded in UTF-8. Using utf8_encode on it will garble it (the function attempts a conversion from Latin-1 to UTF-8). You're therefore actually writing "año" encoded in UTF-8 into your file, which is then "correctly" picked up by LibreOffice.

Simply don't utf8_encode.

I would try to open the csv file with other editor just to make sure te problem is not with the office...

You may be double encoding the content if it is already in UTF-8 format.

I also prefer to aways work with UTF-8, so I get the data from database already in UTF-8 and no more convertion is needed. For that I run this query right after opening the SQL connection:

"set names 'utf8'"