正则表达式分裂字符串,如果它包含wordS或单词[关闭]

正则表达式分裂字符串,如果它包含wordS或单词[关闭]

问题描述:

Edit again to trying to make it more clear.

Wich php regex pattern will give me a match array containing always 2 values wich is the 2 part of a string splitted by the "wordA" or "wordB". If the string do not containt those word, simply return the string as the first array an null in the second array.

Exemple:

preg_match("pattern","foo wordA bar",$match), $match will contain array['foo', 'bar']
preg_match("pattern","foo wordB bar",$match), $match will contain array['foo', 'bar']
preg_match("pattern","foo bar test",$match), $match will contain array['foo bar test', null]

I know that $match first value is always the string so I just don't write it.

OLD question:

I need to split a one line address into part. I can't find a way to capture street part but dont include the APP or APT word if present and if present, capture the words after it.

For exemple:

"5847A, rue Principal APP A" should match: (5847, A, rue Principal,A)

"5847A, rue Prince Arthur APT 22" should match: (5847, A, rue Prince Arthur, 22)

"1111, Sherwood street" should match: (1111, , Sherwood street, )

I'm using PHP.

What I have so far is: /^(\d+)(.*), (.*)(?:APP|APT)(?:\s*(.*))?$/i wich wook with exemple 1 and 2. If I try to make the alternative (APP|APT) optionnal by adding an ? after it, then the third match include the word APP or APT...

Any idea how to exclude the optionnal and alternative APP or APT word from match?

Thank you

EDIT:

I can simplify the problem: How can I regex a string so the match return the same string minus the word APP or APT if he is present in the middle of it.

再次编辑以尝试使其更清晰。 p>

Wich php regex pattern将给我一个匹配数组,该数组总是包含2个值,它们是由“wordA”或“wordB”分割的字符串的2部分。 如果字符串不包含这些字,只需将字符串作为第一个数组返回第二个数组中的空值。 p>

例如: p>

  preg_match(“pattern”,“foo wordA bar”,$ match),$ match将包含数组['foo','bar'] 
preg_match(“pattern”,“foo wordB bar”,$ match),$  match将包含数组['foo','bar'] 
preg_match(“pattern”,“foo bar test”,$ match),$ match将包含数组['foo bar test',null] 
  code>   pre> 
 
 

我知道$ match first value总是字符串所以我就是不写它。 p>

OLD问题: p> \ n

我需要将一个行地址拆分为一部分。 我找不到捕获街道部分的方法,但是如果存在则不包括APP或APT单词,如果存在,则捕获它后面的单词。 / p>

例如: p>

“5847A,rue Principal APP A”应符合:(5847,A,rue Principal,A) p> \ n

“5847A,rue Prince Arthur APT 22”应匹配:(5847,A,rue Prince Arthur,22) p>

“1111,Sherwood street”应符合:( 1111,Sherwood st reet,) p>

我正在使用PHP。 p>

到目前为止我所拥有的是: / ^(\ d +)(。* ),(。*)(?:APP | APT)(?:\ s *(。*))?$ / i code> wich with example 1和2.如果我尝试做出替代方案(APP | APT)可选择添加? 之后,第三场比赛包括单词APP或APT ... p>

知道如何从匹配中排除选项和替代APP或APT单词吗? p> \ n

谢谢 p>

编辑: p>

我可以简化问题:如何对字符串进行正则表达式以使匹配返回相同的字符串减去 APP或APT这个词,如果他出现在它的中间。 p> div>

As @MadaraUchiha pointed out, it's a bad idea to run a regex on an address since they can be in any format.

If you know you have consistent addresses, then I guess you can use the regex:

^([0-9]+)([A-Z])?,\s(?:(.*?)\s(?:APP|APT)\s(.*)|(.*))$

And the replace:

$1,$2,$3$5,$4

Here's how it's performing.

It's pretty similar to yours (I changed few things) and added an or (|) operator to address the second type of addresses without APP or APT.

If you want consistent number of matches, maybe this?

^([0-9]*)([A-Z]?),((?:(?!\sAPP|\sAPT).)*)(?:\sAPP|\sAPT)?(.*)$

Regex101 example.

for the "easy" version

 var_dump(preg_replace ( "/ apt|app /i" , "" ,"5847A, rue Prince Arthur APT 22"  ));

covers it

that outputs

5847A, rue Prince Arthur 22

the harder version you would need more context like why the commas appear like they do.

the hard version

([0-9]*)([a-z]?),(((?!app|apt).)*)(?:app|apt)?(.*)

seems to work on your test cases

I think this should work:

$pattern = "/\bAPP|APT\b/i";
$subject = "1111, Sherwood street";
echo preg_replace($pattern, "", $subject);