PHP:什么时候应该将未转义的UTF-8保存到json文件中?

问题描述:

Is there any benefit of saving UTF-8 characters unescaped in a json file if one only access them through PHP?

Here is what I tested:

fwrite(fopen('fileA.json','w'), json_encode('аккредитовать'));  

then the content of fileA.json is given by

"\u0413\u043b\u0430\u0432\u043d\u0430\u044f"

However, when I store it with

fwrite(fopen('fileB.json','w'), json_encode('аккредитовать', JSON_UNESCAPED_UNICODE));

the content of fileB.json is given by

"аккредитовать"

To my surprise each of the following calls

echo json_decode(file_get_contents('fileA.json'));
echo json_decode(file_get_contents('fileB.json'));
echo json_decode(file_get_contents('fileA.json')), false, 512, JSON_UNESCAPED_UNICODE);
echo json_decode(file_get_contents('fileB.json')), false, 512, JSON_UNESCAPED_UNICODE);

gives the same output:

'аккредитовать'

So as a result I would conclude that I only need to save UTF-8 chars in a json file if I want to open and read the json file directly with an editor. If I only plan to show/save the content of the json file with php then I don't need save the content unescaped and I can use

fwrite(fopen('fileA.json','w'), json_encode('аккредитовать'));  
echo json_decode(file_get_contents('fileA.json'));`

Is that correct, or did I miss anything important?

如果只通过json文件访问它们,将UTF-8字符保存在json文件中是否有任何好处 PHP? strong> p>

这是我测试的内容: p>

  fwrite(fopen('fileA.json','w'  ),json_encode('аккредитовать'));  
  code>  pre> 
 
 

然后 fileA.json code>的内容由 p>

 “\  u0413 \ u043b \ u0430 \ u0432 \ u043d \ u0430 \ u044f“
  code>  pre> 
 
 

然而,当我用 p>

  fwrite(fopen('fileB.json','w'),json_encode('аккредитовать',JSON_UNESCAPED_UNICODE)); 
  code>  pre> 
 
 

的内容 fileB.json code>由 p>

 “аккредитовать”
  code>  pre> 
 
 

给出了我的惊喜 跟随调用 p>

  echo json_decode(file_get_contents('fileA.json')); 
echo json_decode(file_get_contents('fileB.json')); 
echo json_decode(file_get_contents('  fileA.json')),false,512,JSON_UNESCAPED_UNICODE); 
echo json_decode(file_get_contents('fileB.json')),false,512,JSON_UNESCAPED_UNICODE); 
  code>  pre> 
 
 给出相同的输出: p> 
 
 
 'аккредитовать'
  code>  pre> 
 
 

因此,我得出的结论是 如果需要,只需要在json文件中保存UTF-8字符 用编辑器直接打开和读取json文件。 如果我只计划用php显示/保存json文件的内容,那么我不需要保存未转义的内容,我可以使用 p>

  fwrite(fopen('fileA  .json','w'),json_encode('аккредитовать'));  
echo json_decode(file_get_contents('fileA.json'));`
  code>  pre> 
 
 

这是正确的,还是我错过了什么重要的东西? p>

With JSON_UNESCAPED_UNICODE the JSON is now:

  1. more human readable
  2. not ASCII-safe

That's the only tradeoff you're making. Once you have non-ASCII characters in your JSON, you need to ensure the JSON is handled in a binary-safe manner; e.g. you cannot simply send it over a channel that expects only ASCII data, or you need to care about the specific encoding if a channel is encoding aware (e.g. storing it in a database). None of this is of any concern when simply writing the data to a file and then reading it again, as long as the reader is treating the encoding correctly (which PHP is doing here, since it doesn't care about the encoding).

The JSON format itself doesn't care either way, "а" and "\u0413" represent the exact same character.

It should be noted that escaped Unicode takes up more storage than UTF-8 encoded text (6-12 bytes vs. 2-4 bytes). But that hardly matters in the majority of cases.

Note also: JSON_UNESCAPED_UNICODE is not a valid flag for json_decode; it's simply superfluous there.