在 Ruby 中将 Unicode 数字转换为整数
不幸的是,我有一些数字作为字符串使用非 ASCII 数字输入.我需要将它们转换为常规的 Ruby 数字以对它们进行一些数学运算.因此,例如,如果数字作为字符串۱۹"进来,它是 19 但作为字符扩展阿拉伯印度数字一"后跟扩展阿拉伯印度数字九",我需要一种方法将其转换为Ruby 整数 Fixnum 19.
I have some numbers coming in as strings using non-ASCII digits unfortunately. I need to convert them to regular Ruby numbers to do some math on them. So for example if the number-as-a-string "۱۹" comes in, which is 19 but as the characters "extended arabic indic digit one" followed by "extended arabic indic digit nine", I need a way to convert that to the Ruby integer Fixnum 19.
问题是,据此,有55 组 0-9 这些扩展数字,即我需要处理的总共 550 个代码点.
The problem is, according to this, there are 55 groups of 0-9 of these extended digits, i.e. 550 total codepoints I need to handle.
我已经知道对于给定的组,连续数字的代码点是连续的,因此例如扩展阿拉伯印度数字 0 是 U+06F0,扩展阿拉伯印度数字 9 是 U+06F9,所以我可以测试每个数字查看它在哪个范围内,然后从我正在查看的字符的代码点中减去作为整数的零代码点,得到常规的 Ruby 整数.例如,6F9 - 6F0 = 9(粗略地说,一旦它们被转换为它们的整数代码点).
I already know that for a given group, the codepoints for consecutive digits are contiguous, so for example extended arabic indic digit 0 is U+06F0 and extended arabic indic digit 9 is U+06F9, so I can test each digit to see which range it's in and then subtract the zero codepoint as an integer from the codepoint of the character I'm looking at, to give me the regular Ruby integer. For example, 6F9 - 6F0 = 9 (in rough terms, once they're converted to their integer code points).
但要做到这一点,我需要为这 55 个范围创建一个巨大的查找哈希,这需要大量输入.我想我可以将上面链接中的 HTML 表格翻译成 ruby 地图,但这感觉很糟糕.
But to do this, I need to create a giant lookup hash for these 55 ranges and that's a lot of typing. I suppose I could translate the HTML table at the link above into a ruby map, but that feels hacky.
我已经知道了
"۱۹" =~ /[[:digit:]]+/
将是一个匹配项,但问题是如何将这些 Unicode 数字转换回常规的 Ruby 整数?"
will be a match, but the question is "How to turn those Unicode digits back into regular Ruby integers?"
必须有更好的方法!有什么想法吗?
There has to be a better way! Any ideas?
谢谢!
这相对轻松.
class DecimalToIntegerConverter
altzeros = [0x06f0, 0xff10] # ... need all zeroes here
@@digits = altzeros.flat_map { |z| ((z.chr(Encoding::UTF_8))..((z+9).chr(Encoding::UTF_8))).to_a }.join('')
@@replacements = "0123456789" * altzeros.size
def self.convert(str)
str.tr(@@digits, @@replacements).to_i
end
end
str = "۱۹ and 25?"
str.scan(/[[:digit:]]+/).map do |s|
DecimalToIntegerConverter.convert(s)
end
# => [19, 25]