限制 XML 模式中的单词列表

限制 XML 模式中的单词列表

问题描述:

我正在编写一个 XML 模式,需要防止元素的文本匹配某些值.(例如,variableName 元素不能匹配 'int'、'byte'、'string' 等)

I'm writing an XML schema and need to prevent the text of an element from matching certain values. (Eg. the variableName element cannot match 'int', 'byte', 'string' etc.)

我曾尝试对类似于^(int|byte|string)"的模式元素使用限制,但没有成功.

I have tried using a restriction with a pattern element similar to "^(int|byte|string)", but without success.

您是否知道格式化正则表达式的方法,或任何其他方法来实现此目的?

Do you know the way to format the regular expression, or any other way to make this work?

在三重检查 XML Schema (XSD) 正则表达式后确实不支持任何可以简化此任务的功能(特别是前瞻和锚点),我想出了一种似乎有效的方法.我使用了自由间距模式以使其更易于阅读,但这是 XSD 风格不支持的另一个功能.

After triple-checking that XML Schema (XSD) regexes really don't support any of the features that would make this task easy (particularly lookaheads and anchors), I've come up with an approach that seems to work. I used free-spacing mode to make it easier to read, but that's another feature the XSD flavor doesn't support.

  [^ibs].* |
  i(.{0,1} | [^n].* | n[^t].* | nt.+) |
  b(.{0,2} | [^y].* | y[^t].* | yt[^e].* | yte.+) |
  s(.{0,4} | [^t].* | t[^r].* | tr[^i].* | tri[^n].* | trin[^g].* | tring.+)

第一个选项匹配不以任何关键字的首字母开头的任何内容.每个其他顶级替代项都匹配以与关键字之一相同的字母开头的字符串,但是:

The first alternative matches anything that doesn't start with the initial letter of any of the keywords. Each of the other top-level alternatives matches a string that starts with the same letter as one of the keywords but:

  • 比关键字短,
  • 具有不同的第二个字母、不同的第三个字母等,或
  • 比关键字长.

请注意,XSD 正则表达式不支持显式锚点(即,^$\A\z),但所有匹配项都隐式锚定在两端.

Note that XSD regexes don't support explicit anchors (i.e., ^, $, \A, \z), but all matches are implicitly anchored at both ends.

我可以看到一个潜在的问题:如果关键字列表很长,您可能会遇到正则表达式绝对长度的限制.

One potential problem I can see: if the list of keywords is long, you might run up against a limit on the sheer length of the regex.