查找该编码在.NET中是兼容ASCII
有没有真正找到任何简单的方法,编码在.NET中的ASCII兼容?
Is there actually any simple method of finding which encodings in .NET are ASCII-compatible?
(基于的 Nyerguds的评论。)
我们将承担ASCII的标准定义是仅限于128个字符(即,字节值,其最显著位为0)。 Unicode的设计使得它的前128个码位对应的ASCII码值。由于.NET中的字符
结构的数值对应的Unicode代码点(除代理人),我们可以定义像这样一个实用的方法:
We will assume the standard definition of ASCII that is limited to 128 characters (namely, byte values whose most significant bit is 0). Unicode was designed such that its first 128 code points correspond to their ASCII equivalents. Since the numeric value of the char
structure in .NET corresponds to its Unicode code point (except for surrogates), we can define a utility method like so:
private static readonly byte[] asciiValues =
Enumerable.Range(0, 128).Select(b => (byte)b).ToArray();
private static readonly string asciiChars =
new string(asciiValues.Select(b => (char)b).ToArray());
public static bool IsAsciiCompatible(Encoding encoding)
{
try
{
return encoding.GetString(asciiValues).Equals(asciiChars, StringComparison.Ordinal)
&& encoding.GetBytes(asciiChars).SequenceEqual(asciiValues);
}
catch (ArgumentException)
{
// Encoding.GetString may throw DecoderFallbackException if a fallback occurred
// and DecoderFallback is set to DecoderExceptionFallback.
// Encoding.GetBytes may throw EncoderFallbackException if a fallback occurred
// and EncoderFallback is set to EncoderExceptionFallback.
// Both of these derive from ArgumentException.
return false;
}
}
我们可以再列举所有的.NET编码就像这样:
We could then enumerate all .NET encodings like so:
var encodings = Encoding.GetEncodings().Select(e => e.GetEncoding()).ToList();
var asciiCompatible = encodings.Where(e => IsAsciiCompatible(e)).ToList();
var nonAsciiCompatbile = encodings.Except(asciiCompatible).ToList();
Console.WriteLine("ASCII compatible: ");
foreach (var encodingName in asciiCompatible.Select(e => e.EncodingName).OrderBy(n => n))
Console.WriteLine("* " + encodingName);
Console.WriteLine();
Console.WriteLine("Non-ASCII compatible: ");
foreach (var encodingName in nonAsciiCompatbile.Select(e => e.EncodingName).OrderBy(n => n))
Console.WriteLine("* " + encodingName);
请注意,此方法并非完全安全。如果存在多字节编码,做的连续字节或字符花哨的映射 - 如解码 0x61
到'A'
和 0X62
到'b'
(如在ASCII),但 0x6261
为,
- 那么这个测试将给予不正确的结果。
Note that this method is not entirely safe. If there exists a multi-byte encoding that does fancy mappings of consecutive bytes or characters – such as decoding 0x61
to 'a'
and 0x62
to 'b'
(like in ASCII) but 0x6261
to "�"
– then this test would give incorrect results.
在.NET运行此。小提琴(片断)给出以下的结果:
Running this on .NET Fiddle (snippet) gives the following results:
ASCII兼容:
- 阿拉伯语(864)
- 阿拉伯语(ASMO 708 )
- 阿拉伯语(DOS)
- 阿拉伯语(ISO)
- 阿拉伯语(苹果机)
- 阿拉伯语(Windows)中
- 波罗的海(DOS)
- 波罗的海(ISO)
- 波罗的海(Windows)中
- 中欧(DOS)
- 中欧(ISO)
- 中央欧洲(苹果机)
- 中欧(Windows)中
- 中国简体中文(EUC)
- 中国简体( GB18030)
- 中国简体中文(GB2312)
- 中国简体中文(GB2312-80)
- 中国简体( ISO-2022)
- 中国简体(苹果机)
- 中国繁体(Big5)
- 中国传统( CNS)
- 中国传统(倚天)
- 中国传统(苹果机)
- 克罗地亚(苹果机)
- 西里尔文(DOS)
- 西里尔语(ISO)
- 西里尔语(KOI8-R)
- 西里尔语(KOI8-U)
- 西里尔(苹果机)
- 希腊语(Windows)
- 爱沙尼亚(ISO)
- 加拿大法语(DOS)
- 希腊(DOS)
- 希腊(ISO)
- 希腊(苹果机)
- 希腊语(Windows)中
- 希腊,现代(DOS )
- 希伯来语(DOS)
- 希伯来语(ISO-逻辑)
- 希伯来语(ISO-视频)
- 希伯来语(苹果机)
- 希伯来语(Windows)中
- IBM5550台湾
- 冰岛(DOS)
- 冰岛(苹果机)
- ISCII阿萨姆
- ISCII孟加拉语
- ISCII梵文
- ISCII古吉拉特语
- ISCII卡纳达语
- ISCII马拉雅拉姆语
- ISCII奥里亚
- ISCII旁遮普
- ISCII泰米尔
- ISCII泰卢固
- 日语(EUC)
- 日本(JIS 0208-1990和0212-1990)
- 日本(苹果机)
- 日语(Shift-JIS)
- 韩国
- 韩国(EUC)
- 韩国(裘哈)
- 韩国(苹果机)
- 韩国Wansung
- 拉丁语3(ISO)
- 拉丁语9(ISO)
- 北欧(DOS)
- OEM西里尔
- OEM多语言拉丁语I
- OEM美国
- 葡萄牙文(DOS)
- 罗马尼亚(苹果机)
- TCA台湾
- 图文台湾
- 泰国( Windows中)
- 土耳其(DOS)
- 土耳其(ISO)
- 土耳其(苹果机)
- 土耳其(视窗)
- 乌克兰(苹果机)
- 的Unicode(UTF-8)
- US-ASCII
- 越南(视窗)
- 汪胎菀
- 西欧( DOS)
- 西欧(ISO)
- 西欧(苹果机)
- 西欧(Windows)中
- Arabic (864)
- Arabic (ASMO 708)
- Arabic (DOS)
- Arabic (ISO)
- Arabic (Mac)
- Arabic (Windows)
- Baltic (DOS)
- Baltic (ISO)
- Baltic (Windows)
- Central European (DOS)
- Central European (ISO)
- Central European (Mac)
- Central European (Windows)
- Chinese Simplified (EUC)
- Chinese Simplified (GB18030)
- Chinese Simplified (GB2312)
- Chinese Simplified (GB2312-80)
- Chinese Simplified (ISO-2022)
- Chinese Simplified (Mac)
- Chinese Traditional (Big5)
- Chinese Traditional (CNS)
- Chinese Traditional (Eten)
- Chinese Traditional (Mac)
- Croatian (Mac)
- Cyrillic (DOS)
- Cyrillic (ISO)
- Cyrillic (KOI8-R)
- Cyrillic (KOI8-U)
- Cyrillic (Mac)
- Cyrillic (Windows)
- Estonian (ISO)
- French Canadian (DOS)
- Greek (DOS)
- Greek (ISO)
- Greek (Mac)
- Greek (Windows)
- Greek, Modern (DOS)
- Hebrew (DOS)
- Hebrew (ISO-Logical)
- Hebrew (ISO-Visual)
- Hebrew (Mac)
- Hebrew (Windows)
- IBM5550 Taiwan
- Icelandic (DOS)
- Icelandic (Mac)
- ISCII Assamese
- ISCII Bengali
- ISCII Devanagari
- ISCII Gujarati
- ISCII Kannada
- ISCII Malayalam
- ISCII Oriya
- ISCII Punjabi
- ISCII Tamil
- ISCII Telugu
- Japanese (EUC)
- Japanese (JIS 0208-1990 and 0212-1990)
- Japanese (Mac)
- Japanese (Shift-JIS)
- Korean
- Korean (EUC)
- Korean (Johab)
- Korean (Mac)
- Korean Wansung
- Latin 3 (ISO)
- Latin 9 (ISO)
- Nordic (DOS)
- OEM Cyrillic
- OEM Multilingual Latin I
- OEM United States
- Portuguese (DOS)
- Romanian (Mac)
- TCA Taiwan
- TeleText Taiwan
- Thai (Windows)
- Turkish (DOS)
- Turkish (ISO)
- Turkish (Mac)
- Turkish (Windows)
- Ukrainian (Mac)
- Unicode (UTF-8)
- US-ASCII
- Vietnamese (Windows)
- Wang Taiwan
- Western European (DOS)
- Western European (ISO)
- Western European (Mac)
- Western European (Windows)
非ASCII兼容的:
- 中国简体中文(HZ)
- 欧洲
- 德国(IA5)
- IBM EBCDIC(阿拉伯语)
- IBM EBCDIC(西里尔俄语)
- IBM EBCDIC(西里尔塞尔维亚 - 保加利亚语)
- IBM EBCDIC(丹麦 - 挪威)
- IBM EBCDIC(丹麦 - 挪威 - 欧洲)
- IBM EBCDIC(芬兰,瑞典)
- IBM EBCDIC(芬兰 - 瑞典 - 欧洲)
- IBM EBCDIC(法国)
- IBM EBCDIC(法国 - 欧洲)
- IBM EBCDIC(德国)
- IBM EBCDIC(德国 - 欧洲)
- IBM EBCDIC(希腊现代)
- IBM EBCDIC(希腊)
- IBM EBCDIC(希伯来文)
- IBM EBCDIC(冰岛)
- IBM EBCDIC(冰岛 - 欧洲)
- IBM EBCDIC(国际)
- IBM EBCDIC(International-欧元)
- IBM EBCDIC(意大利)
- IBM EBCDIC(意大利 - 欧洲)
- IBM EBCDIC(日本片假名)
- IBM EBCDIC(朝鲜语扩展)
- IBM EBCDIC(多语言拉丁语-2)
- IBM EBCDIC(西班牙)
- IBM EBCDIC(西班牙 - 欧洲)
- IBM EBCDIC(泰国)
- IBM EBCDIC(土耳其拉丁语5)
- IBM EBCDIC(土耳其)
- IBM EBCDIC(UK)
- IBM EBCDIC(英国 - 欧洲)
- IBM EBCDIC(美国 - 加拿大)
- IBM EBCDIC(美国 - 加拿大 - 欧洲)
- IBM的Latin-1
- IBM的Latin-1
- ISO-6937
- 日本(JIS)
- 日本(JIS-允许1字节假名 - SO / SI)
- 日本(JIS-允许1字节假名)
- 韩国(ISO)
- 挪威语(IA5)
- 瑞典(IA5)
- T.61
- 泰国(苹果机)
- 的Unicode的(UTF-16)的李>
- 的Unicode(大型)
- 的Unicode(UTF-32大型)
- 的Unicode(UTF-32 )
- 的Unicode(UTF-7)
- 西欧(IA5)
- Chinese Simplified (HZ)
- Europa
- German (IA5)
- IBM EBCDIC (Arabic)
- IBM EBCDIC (Cyrillic Russian)
- IBM EBCDIC (Cyrillic Serbian-Bulgarian)
- IBM EBCDIC (Denmark-Norway)
- IBM EBCDIC (Denmark-Norway-Euro)
- IBM EBCDIC (Finland-Sweden)
- IBM EBCDIC (Finland-Sweden-Euro)
- IBM EBCDIC (France)
- IBM EBCDIC (France-Euro)
- IBM EBCDIC (Germany)
- IBM EBCDIC (Germany-Euro)
- IBM EBCDIC (Greek Modern)
- IBM EBCDIC (Greek)
- IBM EBCDIC (Hebrew)
- IBM EBCDIC (Icelandic)
- IBM EBCDIC (Icelandic-Euro)
- IBM EBCDIC (International)
- IBM EBCDIC (International-Euro)
- IBM EBCDIC (Italy)
- IBM EBCDIC (Italy-Euro)
- IBM EBCDIC (Japanese katakana)
- IBM EBCDIC (Korean Extended)
- IBM EBCDIC (Multilingual Latin-2)
- IBM EBCDIC (Spain)
- IBM EBCDIC (Spain-Euro)
- IBM EBCDIC (Thai)
- IBM EBCDIC (Turkish Latin-5)
- IBM EBCDIC (Turkish)
- IBM EBCDIC (UK)
- IBM EBCDIC (UK-Euro)
- IBM EBCDIC (US-Canada)
- IBM EBCDIC (US-Canada-Euro)
- IBM Latin-1
- IBM Latin-1
- ISO-6937
- Japanese (JIS)
- Japanese (JIS-Allow 1 byte Kana - SO/SI)
- Japanese (JIS-Allow 1 byte Kana)
- Korean (ISO)
- Norwegian (IA5)
- Swedish (IA5)
- T.61
- Thai (Mac)
- Unicode (UTF-16)
- Unicode (Big-Endian)
- Unicode (UTF-32 Big-Endian)
- Unicode (UTF-32)
- Unicode (UTF-7)
- Western European (IA5)