.NET的IMAP文件夹路径编码(IMAP UTF-7)?

问题描述:

IMAP规范( RFC 2060 ,5.1.3.邮箱国际命名约定)介绍了如何处理文件夹名称中的非ASCII字符.它定义了一种修改的UTF-7编码:

The IMAP specification (RFC 2060, 5.1.3. Mailbox International Naming Convention) describes how to handle non-ASCII characters in folder names. It defines a modified UTF-7 encoding:

按照惯例,国际邮箱 名称使用 UTF-7编码的修改版本 在[UTF-7]中进行了介绍.目的 这些修改是为了纠正 UTF-7存在以下问题:

By convention, international mailbox names are specified using a modified version of the UTF-7 encoding described in [UTF-7]. The purpose of these modifications is to correct the following problems with UTF-7:

  1. UTF-7使用"+"字符进行移位;这与 邮箱名称(尤其是USENET)中"+"的常用用法 新闻组名称.

  1. UTF-7 uses the "+" character for shifting; this conflicts with the common use of "+" in mailbox names, in particular USENET newsgroup names.

UTF-7的编码为BASE64,使用"/"字符;这 与使用"/"作为流行的层次分隔符冲突.

UTF-7's encoding is BASE64 which uses the "/" character; this conflicts with the use of "/" as a popular hierarchy delimiter.

UTF-7禁止未编码使用"\";这与 使用"\"作为流行的层次分隔符.

UTF-7 prohibits the unencoded usage of "\"; this conflicts with the use of "\" as a popular hierarchy delimiter.

UTF-7禁止未编码使用〜";这与 在某些服务器中使用〜"作为主目录指示符.

UTF-7 prohibits the unencoded usage of "~"; this conflicts with the use of "~" in some servers as a home directory indicator.

UTF-7允许多种替代形式表示相同的内容 细绳;特别是,可打印的US-ASCII字符可以是 以编码形式表示.

UTF-7 permits multiple alternate forms to represent the same string; in particular, printable US-ASCII chararacters can be represented in encoded form.

在经过修改的UTF-7中,可打印的US-ASCII字符(&"除外)代表自己; 也就是说,八位字节值0x20-0x25的字符 和0x27-0x7e.字符&" (0x26)由两个八位字节序列&-"表示.

In modified UTF-7, printable US-ASCII characters except for "&" represent themselves; that is, characters with octet values 0x20-0x25 and 0x27-0x7e. The character "&" (0x26) is represented by the two-octet sequence "&-".

所有其他字符(八位字节值 表示0x00-0x1f,0x7f-0xff和所有Unicode 16位八位字节) 在修改后的BASE64中, 从[UTF-7]修改为,"是 代替"/".
修改后的BASE64不得用于表示 任何打印的US-ASCII字符 可以代表自己.

All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified BASE64, with a further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII character which can represent itself.

&"用于转换为已修改 BASE64和-"移回到US-ASCII.所有名称均以US-ASCII开头, 并且必须以US-ASCII结尾(即, 以Unicode 16位结尾的名称 八位字节必须以-"结尾.

"&" is used to shift to modified BASE64 and "-" to shift back to US-ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that is, a name that ends with a Unicode 16-bit octet MUST end with a "-").

在我开始实施它之前,我的问题是:是否有一些 .NET代码/库(或者甚至在框架中)可以完成这项工作?我找不到.NET资源(仅实现其他语言/框架).

Before I'll start implementing it, my question: is there some .NET code/library out there (or even in the framework) that does the job? I couldn't find .NET resources (only implementations for other languages/frameworks).

谢谢!

这太专门了,无法出现在框架中.尽管我看到许多不完整的实现"根本不会影响转换,并且会很高兴地将所有非us-ascii字符传递到IMAP服务器,但代码复合体上可能仍然存在某些东西.

This is too specialized to be present in a framework. There might be something on codeplex though many incomplete "implementations" I've seen don't do bother with the conversion at all and will happily pass all non-us-ascii characters on to the IMAP server.

但是我过去已经实现了它,实际上它只有30行代码.您遍历字符串中的所有字符,如果它们在0x20到0x7e的范围内(不要忘了在&"之后添加-"),则将它们输出,否则收集所有非us-ascii并使用UTF7(或UTF8 + base64,我不太确定)将"/"替换为,".另外,您需要保持转移状态",例如无论您当前正在编码非us-ascii还是输出us-ascii并附加过渡标记&"和-"表示状态更改.

However I've implemented this in the past and it is really just 30 lines of code. You go through all characters in a string and output them if they fall in the range between 0x20 and 0x7e (don't forget to append "-" after the "&") otherwise collect all non-us-ascii and convert them using UTF7 (or UTF8 + base64, I'm not quite sure here) replacing "/" with ",". Additionally you need to maintain "shifted state", e.g. whether you're currently encoding non-us-ascii or outputting us-ascii and append transition tokens "&" and "-" on state change.