如何在Ruby中将字符串从windows-1252转换为utf-8?

问题描述:

我在Windows XP上使用Ruby 1.8.6将一些数据从MS Access 2003迁移到MySQL 5.0(编写一个Rake任务来执行此操作)。

I'm migrating some data from MS Access 2003 to MySQL 5.0 using Ruby 1.8.6 on Windows XP (writing a Rake task to do this).

出来的Windows字符串数据被编码为windows-1252和Rails和MySQL都假设utf-8输入,所以一些字符,如撇号,正在被改变。

Turns out the Windows string data is encoded as windows-1252 and Rails and MySQL are both assuming utf-8 input so some of the characters, such as apostrophes, are getting mangled. They wind up as "a"s with an accent over them and stuff like that.

有人知道一个工具,库,系统,方法,仪式,法术,或者incantation将windows-1252字符串转换为utf-8?

Does anyone know of a tool, library, system, methodology, ritual, spell, or incantation to convert a windows-1252 string to utf-8?

对于Ruby 1.8.6, Ruby Iconv,标准库的一部分:

For Ruby 1.8.6, it appears you can use Ruby Iconv, part of the standard library:

Iconv文档

根据有用的文章,似乎你至少可以从你的字符串中清除不需要的win-1252字符,如下:

According this helpful article, it appears you can at least purge unwanted win-1252 characters from your string like so:

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

完全转换如下:

ic = Iconv.new('UTF-8', 'WINDOWS-1252')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]