匹配罗马数字

问题描述：

我有正则表达式

(IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})

我用它来检测文本中是否有罗马数字.

I use it to detect if there is any roman number in text.

eregi("( IX|IV|V?I{0,3}[\.]| M{1,4}[\.]| CM|CD|D?C{1,3}[\.]| XC|XL|L?X{1,3}[\.])", $title, $regs)

但是罗马数字的格式始终是这样的:"IV." ...我在eregi示例中在数字和."之前添加了空白.之后的数字，但我仍然得到相同的结果.如果文本是"somethinvianyyhing"之类的结果，则结果将为vi(在两者之间)...

But format of roman number is always like this: " IV."... I have added in eregi example white space before number and "." after number but I still get the same result. If text is something like "somethinvianyyhing" the result will be vi (between both)...

我做错了什么?

答

在VI之前，您没有空格，该空格始终属于替代项，而不是全部. \.的相同之处在于，始终属于其编写位置.

You have no space before VI the space belongs always to the alternative before it was written and not to all. The same for the \. it belongs always to the alternative where it was written.

尝试一下

" (IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})\."

在Regexr上查看

这将匹配

I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X

I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X.

但不是

XI. MMI. MMXI.
些什么

XI. MMI. MMXI.
somethinvianyyhing

您匹配罗马数字的方法远非正确，对于罗马数字，直到50(L)的匹配方法才更正确

Your approach to match roman numbers is far from being correct, an approach to match the roman numbers more correct is this, for numbers till 50 (L)

^(?:XL|L|L?(?:IX|X{1,3}|X{0,3}(?:IX|IV|V|V?I{1,3})))$

在Regexr上查看

我仅在表面上对此进行了测试，但是您会看到它确实变得复杂，并且在此表达式中C，D和M仍然缺失.

I tested this only on the surface, but you see this will really get complex and in this expression C, D and M are still missing.

更不用说特殊情况了，例如4 = IV = IIII，而且还有更多.

Not to speak about special cases for example 4 = IV = IIII and there are more of them.

有关罗马数字的维基百科

相关推荐