


I'm just wondering if it's possible to use one regular expression to match another, that is some sort of:



正则表达式完全可以实现这种事情吗?我正在用 python 工作,所以任何特定于 re 模块实现的建议都会有所帮助,但我会采取任何我能得到的关于正则表达式的东西.

Is this sort of thing possible with regex at all? I'm doing work in python, so any advice specific to the re module's implementation would help, but I'll take anything I can get concerning regex.


Ok, some clarification is obviously in order! I definitely know that normal matching syntax would look something like this:

expr = re.compile(r'[a-z]*')
string = "some words"
<sRE object blah blah>

但我想知道正则表达式是否有能力在我试图用上面解释的非语法正确版本中匹配其他不太具体的表达式,来自 bx 的任何字母将始终是任何字母的子集(匹配)从 az.我只是通过尝试知道这不是您可以通过调用一个编译表达式与另一个编译表达式的匹配来完成的事情,但问题仍然存在:这可能吗?

but I'm wondering if regular expressions have the capability to match other, less specific expressions in the non-syntacticly correct version I tried to explain with above, any letter from b-x would always be a subset (match) of any letter from a-z. I know just from trying that this isn't something you can do by just calling the match of one compiled expression on another compiled expression, but the question remains: is this at all possible?


Let me know if this still isn't clear.

我认为 —理论上要判断正则表达式 A 是否匹配正则表达式 B 匹配的子集,算法可以:

I think — in theory — to tell whether regexp A matches a subset of what regexp B matches, an algorithm could:

  1. 计算B 和联合"A|B 的最小确定性有限自动机.
  2. 检查两个 DFA 是否相同.当且仅当 A 匹配 B 匹配内容的子集时,情况才成立.
  1. Compute the minimal Deterministic Finite Automaton of B and also of the "union" A|B.
  2. Check if the two DFAs are identical. This is true if and only if A matches a subset of what B matches.

然而,在实践中做到这一点可能是一个重大项目.有从正则表达式构建最小状态DFA等解释strong> 但他们只倾向于考虑数学上纯正则表达式.您还必须处理 Python 为方便而添加的扩展.此外,如果任何扩展导致语言不规则(我不确定是否是这种情况),您可能无法处理这些扩展.

However, it would likely be a major project to do this in practice. There are explanations such as Constructing a minimum-state DFA from a Regular Expression but they only tend to consider mathematically pure regexps. You would also have to handle the extensions that Python adds for convenience. Moreover, if any of the extensions cause the language to be non-regular (I am not sure if this is the case) you might not be able to handle those ones.


But what are you trying to do? Perhaps there's an easier approach...?