imap_mail_move()没有处理特殊字符(äüö......)

imap_mail_move()没有处理特殊字符(äüö......)

问题描述:

I am using imap_mail_move() to move emails from one folder to another. This works pretty well, but not if it comes to special characters in the folder name. I am sure I need to encode the name, but all test where not succesful.

Anybody that has a nice idea? Thanks in advance.

class EmailReader {
    [...]

    function doMoveEmail($uid, $targetFolder) {
        $targetFolder = imap_utf8_to_mutf7($targetFolder);
        $return = imap_mail_move($this->conn, $uid, $targetFolder, CP_UID);
        if (!$return) {

            $this->printValue(imap_errors());
           die("stop");
        }
        return $return;
    }

    [...]
}

Calling the function in the script

[...]
$uid = 1234;

$folderTarget1 = "INBOX.00_Korrespondenz";
$this->doMoveEmail($uid, $folderTarget1);

$folderTarget2 = "INBOX.01_Anmeldevorgang.011_Bestätigungslink";
$this->doMoveEmail($uid, $folderTarget2);
[...]

The execution of the first call (folderTarget1) is working pretty well.

The execution of the secound call (folderTarget2) is creating an error:

[TRYCREATE] Mailbox doesn't exist: INBOX.01_Anmeldevorgang.011_Bestätigungslink (0.001 + 0.000 secs).

Remark 1:

if I call imap_list(), the name of the folder is shown as

"INBOX.01_Anmeldevorgang.011_Besta&Awg-tigungslink" (=$val)

using: 
$new = mb_convert_encoding($val,'UTF-8','UTF7-IMAP')
echo $new; // gives --> "INBOX.01_Anmeldevorgang.011_Bestätigungslink"

but:
$new2 = mb_convert_encoding($new,'UTF7-IMAP', 'UTF-8')
echo $new2; // gives --> "INBOX.01_Anmeldevorgang.011_Best&AOQ-tigungslink"

Remark 2

I checked each possible encoding, with the following script, but none of them matchs the value that is returned by imap_list().

// looking for "INBOX.01_Anmeldevorgang.011_Besta&Awg-tigungslink" given by imap_list().

$targetFolder = "INBOX.01_Anmeldevorgang.011_Bestätigungslink";

foreach(mb_list_encodings() as $chr){
  echo mb_convert_encoding($targetFolder, $chr, 'UTF-8')." : ".$chr."<br>";
}

I created a workaround, which helps me to work with UTF8-values and to translate it to the original (raw) IMAP folder name.

    function getFolderList() {
        $folders = imap_list($this->conn, "{".$this->server."}", "*");
        if (is_array($folders)) {

            // Remove Server details of each element of array
            $folders = array_map(function($val) { return str_replace("{".$this->server."}","",$val); }, $folders);

            // Sort array
            asort($folders);

            // Renumber the list
            $folders = array_values($folders);

            // add UTF-8 encoded value to array
            // this is needed as the original value is so wiered, that it is not possible to encode it
            // with a function on the fly. This additional utf-8 value is needed to map the utf-8 value
            // to the original value. The original value is still needed to do some operations like e.g.:
            //  - imap_mail_move()
            //  - imap_reopen()
            // ==> the trick is to use normalizer_normalize()
            $return = array();
            foreach ($folders as $key => $folder) {
                $return[$key]['original'] = $folder;
                $return[$key]['utf8']     = normalizer_normalize(mb_convert_encoding($folder,'UTF-8','UTF7-IMAP'));
            }


            return $return;
        } else {
            die("IMAP_Folder-List failed: " . imap_last_error() . "
");
        }
    }

Your folder name, as on the server, Besta&Awg-tigungslink is not canonically encoded:

&Awg- decodes as the combining diaereses character. Using some convenient python to look it up:

import base64
import unicode data
x = base64.b64decode('Awg=').decode('utf-16be'); # equals added to satisfy base64 padding requirements
unicodedata.name(x)
# Returns 'COMBINING DIAERESIS'

This combines with the a in front of it to show ä.

Your encoder is returning the more common precomposed form:

x = base64.b64decode('AOQ=').decode('utf-16be')
unicodedata.name(x)
# Returns: 'LATIN SMALL LETTER A WITH DIAERESIS'

This is a representation of ä directly.

Normally, when you work with IMAP folders, you pass around the raw name, and only convert the folder name for display. As you can see, there is not necessarily a one-way mapping from glyphs to encodings in unicode.

It does surprise me that PHP does seem to be doing a canonicalization step when encoding; I would expect round tripping the same data to return the same thing.