仅当字符串包含每个列表中的单词时,RegEx才匹配
我正在开发一种软件,该软件必须检查文本中是否包含来自指定列表的单词和来自另一个指定列表的单词.
I'm developing a software that has to check if a text contains a word taken from a specified list and a word taken from another specified list.
示例:
list 1: dog, cat
list 2: house, tree
以下文本必须匹配:
the dog is in the house -> contains dog and house
my house is full of dogs -> contains dog and house
the cat is on the tree -> contains cat and tree
以下示例必须不匹配
the frog is in the house -> there is no word from the first list
Boby is the name of my dog -> there is no word from the second list
Outside my house there is a tree -> there is no word from the first list
为快速解决该问题,我列出了以下模式:
To solve quickly the problem I've made a list of pattern like:
dog.*house, house.*dog, cat.*house, ...
但是我很确定有一种更聪明的方法...
but I'm pretty sure there is an smarter way...
您可以为每组替代项使用替代(|
),并为订单使用包装器替代.所以:
You can use an alternation (|
) for each of the sets of alternatives, and a wrapper alternation for the order. So:
(?:(?:dog|cat).*(?:house|tree))|(?:(?:house|tree).*(?:dog|cat))
JavaScript示例(非捕获组和替换在Java和JavaScript中的工作原理相同):
JavaScript Example (non-capturing groups and alternations work the same in Java and JavaScript):
var tests = [
{match: true, text: "the dog is in the house -> contains dog and house"},
{match: true, text: "my house is full of dogs -> contains dog and house"},
{match: true, text: "the cat is on the tree -> contains cat and tree"},
{match: false, text: "the frog is in the house -> there is no word from the first list"},
{match: false, text: "Boby is the name of my dog -> there is no word from the second list"},
{match: false, text: "Outside my house there is a tree -> there is no word from the first list"}
];
var rex = /(?:(?:dog|cat).*(?:house|tree))|(?:(?:house|tree).*(?:dog|cat))/;
tests.forEach(function(test) {
var result = rex.test(test.text);
if (!!result == !!test.match) {
console.log('GOOD: "' + test.text + '": ' + result);
} else {
console.log('BAD: "' + test.text + '": ' + result + ' (expected ' + test.match + ')');
}
});
.as-console-wrapper {
max-height: 100% !important;
}
请注意,在上文中,我们不是在检查单词,而只是检查字母序列.如果您希望它是实际的单词,则需要添加断词断言或类似内容.留给读者练习……
Note that in the above we're not checking for words, just sequences of letters. If you want it to be actual words, you'll need to add word break assertions or similar. Left as an exercise to the reader...