正则表达式匹配 <h1> 之间的所有字符标签

问题描述:

我正在使用 sublime text 2 编辑器.我想使用正则表达式来匹配所有 h1 标签之间的所有字符.

I'm using sublime text 2 editor. I would like to use regex to match all character between all h1 tags.

目前我是这样使用的

<h1>.+</h1>

如果 h1 标签没有中断,它工作正常.

Its working fine if the h1 tag doesn't have breaks.

我的意思是

<h1>Hello this is a hedaer</h1>

它工作正常.

但是如果标签看起来像这样它就不起作用

But its not working if the tag look like this

<h1>
   Hello this is a hedaer
</h1>

有人可以帮助我了解语法吗?

Can someone help me with the syntax?

默认情况下 . 匹配除换行符以外的所有字符.

By default . matches every character except new line character.

在这种情况下,您将需要 DOTALL 选项,这将使 . 匹配任何字符,包括换行符.DOTALL 选项可以内联指定为 (?s).例如:

In this case, you will need DOTALL option, which will make . matches any character, including new line character. DOTALL option can be specified inline as (?s). For example:

(?s)<h1>.+</h1>

但是,您会发现它不起作用,因为量词的默认行为是贪婪(在本例中为 +),这意味着它将尝试使用尽可能多的字符.您需要通过在量词 +? 后添加额外的 ? 使其懒惰(使用尽可能少的字符):

However, you will see that it will not work, since the default behavior of the quantifier is greedy (in this case its +), which means that it will try to consume as many characters as possible. You will need to make it lazy (consume as few characters as possible) by adding extra ? after the quantifier +?:

(?s)<h1>.+?</h1>

或者,正则表达式可以是 <h1>[^<>]*</h1>.在这种情况下,您不需要指定任何选项.


Alternatively, the regex can be <h1>[^<>]*</h1>. In this case, you don't need to specify any option.