使用正则表达式在javascript中匹配引号引起来的字符串
我需要一个正则表达式来匹配
I need a regex for javascript for matching
"{any group of chars}" <-- where that last " is not preceeded by a \
示例:
... foo "bar" ... => "bar"
... foo"bar\"" ... => "bar\""
... foo "bar" ... goo"o"ooogle "t\"e\"st"[] => ["bar", "o", "t\"e\"st"]
实际的字符串会更长,并且可能包含多个匹配项,其中还可能包含空格或正则表达式特殊字符.
The actual strings will be longer and may contain multiple matches that could also include white space or regex special chars.
我首先尝试分解语法,但是我自己对regex不够强,我被卡住得很快,但是除了匹配包含\的情况之外,我确实做到了一切都匹配."(我认为)..
I have started by trying to break down the syntax but not being strong with regex myself I got stuck pretty fast but i did get as far as matching everything except for the case where the match contains \" (i think) ...
https://regex101.com/r/sj4HXw/1
更新:
有关我的处境的更多信息...
More about my situation ...
此正则表达式用于在我的博客文章中嵌入的代码块中语法突出显示"字符串,因此真实的示例可能看起来像这样……
This regex is to be used to "syntax highlight" strings in code blocks embedded in my blog posts so a real world example might look something like this ...
<pre id="test" class="code" data-code="csharp">
if (ConfigurationManager.AppSettings["LogSql"] == "true")
</pre>
我正在使用以下javascript来突出显示..
And I am using the following javascript to achieve the highlight ..
var result = $("#test").text().replace(/"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g, "<span class=\"string\">$1</span>");
$("#test").html(result);
由于某种原因,即使在这种情况下使用建议的答案(至少到目前为止),我也得到了奇怪的结果.
For some reason even when the suggested answers (so far at least) are used in this context i'm getting odd results.
这可行,但是由于某些原因,将值$ 1代替了实际匹配项.
This works but puts the value $1 instead of the actual match for some reason.
简单方案(如OP中一样)
最有效的正则表达式(根据 unroll-the-循环原理),您可以在此处使用
Simple scenario (as in OP)
The most efficient regex (that is written in accordance with the unroll-the-loop principle) you may use here is
"[^"\\]*(?:\\[\s\S][^"\\]*)*"
请参见 regex演示
详细信息:
-
"
-匹配第一个"
-
[^" \\] *
-除"
和\
以外的0多个字符 -
(?:\\ [\ s \ S] [^" \\] *)*
-出现以下情况:-
\\ [\ s \ S]
-前面带有\
的任何字符([\ s \ S]
) -
[^" \\] *
-除"
和\
以外的0多个字符
-
"
- match the first"
-
[^"\\]*
- 0+ chars other than"
and\
-
(?:\\[\s\S][^"\\]*)*
- zer or more occurrences of:-
\\[\s\S]
- any char ([\s\S]
) with a\
in front -
[^"\\]*
- 0+ chars other than"
and\
用法:
// MATCHING var rx = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g; var s = ' ... foo "bar" ... goo"o"ooogle "t\\"e\\"st"[]'; var res = s.match(rx); console.log(res); // REPLACING console.log(s.replace(rx, '<span>$&</span>'));
如果在有效匹配之前存在转义的
"
,或者在"
之前存在\
,则上述方法将不会工作.您将需要匹配那些\
并捕获所需的子字符串.If there is an escaped
"
before a valid match or there are\
s before a"
, the approach above won't work. You will need to match those\
s and capture the substring you need./(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g ^^^^^^^^^^^^^^^^^^^^^^ ^
请参见另一个正则表达式演示.
用法:
// MATCHING var rx = /(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g; var s = ' ... \\"foo "bar" ... goo"o"ooogle "t\\"e\\"st"[]'; var m, res=[]; while (m = rx.exec(s)) { res.push(m[1]); } console.log(res); // REPLACING console.log(s.replace(/((?:^|[^\\])(?:\\{2})*)("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g, '$1<span>$2</span>'));
主模式包含捕获括号,并且将其添加在开头:
The main pattern is wrapped with capturing parentheses, and this is added at the start:
-
(?:^ | [^ \\])
-字符串的开头或除\
以外的任何字符 -
(?:\\ {2})*
-0次以上出现双反斜杠.
-
(?:^|[^\\])
- either start of string or any char but\
-
(?:\\{2})*
- 0+ occurrences of a double backslash.
-
-