

我正在尝试解析XML文件( http://jstryczek.blox.pl/rss2),表示其字符集为ISO-8859-2。我的数据库位于UTF-8中,因此我想将其转换为UTF-8。

I'm trying to parse an XML file (http://jstryczek.blox.pl/rss2) that says its character set is ISO-8859-2. My database is in UTF-8, so I want to convert it to UTF-8.


To do that I run the following on the string:

$content = iconv('ISO-8859-2', 'UTF-8//TRANSLIT', $content);


For some reason, I'm getting back an odd encoding, so that:

Gdzie są różnice


Gdzie sÄ róşnice

有没有解释为什么波兰字符无法通过? UTF-8不支持它们吗?

Is there an explanation for why the Polish characters aren't coming through? Does UTF-8 not support them?


I fix this by changing the string to json and then replace all polish special signs to html code. I add below my result:

        $specialChars = [
        '\u0105', # ą
        '\u0107', # ć
        '\u0119', # ę
        '\u0142', # ł
        '\u0144', # ń
        '\u00f3', # ó
        '\u015b', # ś
        '\u017a', # ź
        '\u017c', # ż
        '\u0104', # Ą
        '\u0106', # Ć
        '\u0118', # Ę
        '\u0141', # Ł
        '\u0143', # Ń
        '\u00d3', # Ó
        '\u015a', # Ś
        '\u0179', # Ż
        '\u017b', # Ż

    $polishHtmlCodes = [
        'ą', # ą
        'ć', # ć
        'ę', # ę
        'ł', # ł
        'ł', # ń
        'ó', # ó
        'ś', # ś
        'ź', # ź
        'ż', # ż
        'Ą', # Ą
        'Ć', # Ć
        'Ę', # Ę
        'Ł', # Ł
        'Ń', # Ń
        'Ó', # Ó
        'Ś', # Ś
        'Ź', # Ż
        'Ż', # Ż

    $result = str_replace($specialChars, $polishHtmlCodes, json_encode($string));

// prints
// e.g. 'Różowe okulary'