在这样一种方式,将认识到的自定义if语句解析明文

在这样一种方式,将认识到的自定义if语句解析明文

问题描述:

我有以下字符串:

$string = "The man has {NUM_DOGS} dogs."

我通过下面的函数运行它解析这个

function parse_text($string) { global $num_dogs; $string = str_replace('{NUM_DOGS}', $num_dogs, $string); return $string; } parse_text($string);

Where $num_dogs is a preset variable. Depending on $num_dogs, this could return any of the following strings:

其中, $ num_dogs 为preSET变量。根据 $ num_dogs ,这可能会返回任何以下字符串:

  • The man has 1 dogs.
  • The man has 2 dogs.
  • The man has 500 dogs.

  • 的人有1个狗。

  • 的男人有2只狗。

  • 的人有500犬。

问题是,在男人有1个狗的情况下,的的是pluralised,这是不需要的。我知道这可能是通过不使用 parse_Text在函数,而是做这样的事情根本解决:

if($num_dogs = 1){ $string = "The man has 1 dog."; }else{ $string = "The man has $num_dogs dogs."; }

But in my application I'm parsing more than just {NUM_DOGS} and it'd take a lot of lines to write all the conditions.

但在我的应用我解析不仅仅是 {NUM_DOGS} 键,它会占用大量的行写的所有条件。

I need a shorthand way which I can write into the initial $string which I can run through a parser, which ideally wouldn't limit me to just two true/false possibilities.

我需要,我可以写到初始 $字符串,我可以通过一个分析器,它最好不限制运行的简便方法我只有两个真正的/假的可能性。

For example, let

例如,让

Is it clear what's happened at the end? I've attempted to initiate the creation of an array using the part inside the square brackets that's after the vertical bar, then compare the key of the new array with the parsed value of {NUM_DOGS} (which by now will be the $num_dogs variable at the left of the vertical bar), and return the value of the array entry with that key.

时明确什么在最后发生了什么?我已经尝试启动使用方括号竖线后里面的部分数组的创建,然后比较新的数组与{NUM_DOGS}(的分析值,现在将是$ num_dogs变量的关键在左侧垂直栏),并且返回与该键阵列条目的值。

If that's not totally confusing, is it possible using the preg_* functions?

如果这不是完全混乱,使用preG_ *功能,这可能吗?

解决方案

你的问题的premise是您要为匹配特定的模式,然后替换后执行其他处理在匹配的文本。

Seems like an ideal candidate for preg_replace_callback

好像对 preg_replace_callback

捕捉匹配的括号,引号,括号等常规前pressions可能会变得非常复杂,并用常规的前pression做这一切,其实是非常低效的。事实上,你需要写一个适当的解析器如果这就是你所需要的。

The regular expressions for capturing matched parenthesis, quotes, braces etc. can become quite complicated, and to do it all with a regular expression is in fact quite inefficient. In fact you'd need to write a proper parser if that's what you require.

有关这个问题,我会承担复杂程度有限,并与两阶段解决它使用正则表达式解析。

For this question I'm going to assume a limited level of complexity, and tackle it with a two stage parse using regex.

首先,最简单的正则表达式我能想到关闭大括号捕获令牌。

First of all, the most simple regex I can think off for capturing tokens between curly braces.

/{([^}]+)}/

让我们打破下来。

Lets break that down.

{        # A literal opening brace
(        # Begin capture
  [^}]+  # Everything that's not a closing brace (one or more times)
)        # End capture
}        # Literal closing brace

当应用到字符串 preg_match_all 的结果看起来是这样的:

When applied to a string with preg_match_all the results look something like:

array (
  0 => array (
    0 => 'A string {TOK_ONE}',
    1 => ' with {TOK_TWO|0=>"no", 1=>"one", 2=>"two"}',
  ),
  1 => array (
    0 => 'TOK_ONE',
    1 => 'TOK_TWO|0=>"no", 1=>"one", 2=>"two"',
  ),
)

看起来好为止。

请注意,如果你在你的字符串嵌套的括号,即 {TOK_TWO | 0 =>中喜{X} Y} ,此正则表达式将无法正常工作。如果这不会是一个问题,直接跳到下一节。

Please note that if you have nested braces in your strings, i.e. {TOK_TWO|0=>"hi {x} y"}, this regex will not work. If this wont be a problem, skip down to the next section.

这是可以做到的*配套,但我曾经去过能够做到这一点的唯一方法是通过递归。大多数正则表达式的退伍军人会尽快告诉你,你添加递归正则表达式,它不再是一个正则表达式。

It is possible to do top-level matching, but the only way I have ever been able to do it is via recursion. Most regex veterans will tell you that as soon as you add recursion to a regex, it stops being a regex.

这是额外的处理复杂踢,并与复杂长字符串它很容易跑出来的堆栈空间和程序崩溃。如果你需要在所有使用它小心地使用它。

This is where the additional processing complexity kicks in, and with long complicated strings it's very easy to run out of stack space and crash your program. Use it carefully if you need to use it at all.

从我的其他答案中的一个,并采取修改一点点递推正则表达式

The recursive regex taken from one of my other answers and modified a little.

`/{((?:[^{}]*|(?R))*)}/`

坏了。

{                   # literal brace
(                   # begin capture
    (?:             # don't create another capture set
        [^{}]*      # everything not a brace
        |(?R)       # OR recurse
    )*              # none or more times
)                   # end capture
}                   # literal brace

和这一次的输出中只匹配*括号

And this time the ouput only matches top-level braces

array (
  0 => array (
    0 => '{TOK_ONE|0=>"a {nested} brace"}',
  ),
  1 => array (
    0 => 'TOK_ONE|0=>"a {nested} brace"',
  ),
)

再次除非你有不使用递归的正则表达式。 (您的系统甚至可能不支持他们,如果它有一个古老的PCRE库)

Again, don't use the recursive regex unless you have to. (Your system may not even support them if it has an old PCRE library)

有了这样的,我们需要制定出如果令牌有与它相关的选项的方式。相反,有两个片段进行匹配按你的问题,我建议你保留的选项与令牌按我的例子。 {TOKEN | 0 =>中选择}

With that out of the way we need to work out if the token has options associated with it. Instead of having two fragments to be matched as per your question, I'd recommend keeping the options with the token as per my examples. {TOKEN|0=>"option"}

让我们假设 $匹配包含匹配的道理,如果我们检查管道 | ,并采取串一切后,我们会留下你的选项列表中,我们可以再一次使用正则表达式来解析出来。 (别担心,我会在年底把一切融合在一起)

Lets assume $match contains a matched token, if we check for a pipe |, and take the substring of everything after it we'll be left with your list of options, again we can use regex to parse them out. (Don't worry I'll bring everything together at the end)

/(\\ d)+ \\ S * =>?\\ S *([^] *)/

坏了。

(\d)+    # Capture one or more decimal digits
\s*      # Any amount of whitespace (allows you to do 0    =>    "")
=>       # Literal pointy arrow
\s*      # Any amount of whitespace
"        # Literal quote
([^"]*)  # Capture anything that isn't a quote
"        # Literal quote
,?       # Maybe followed by a comma

和一个例子匹配

array (
  0 => array (
    0 => '0=>"no",',
    1 => '1 => "one",',
    2 => '2=>"two"',
  ),
  1 => array (
    0 => '0',
    1 => '1',
    2 => '2',
  ),
  2 => array (
    0 => 'no',
    1 => 'one',
    2 => 'two',
  ),
)

如果你想使用你的引号内的报价,你必须使自己的递归的正则表达式吧。

If you want to use quotes inside your quotes, you'll have to make your own recursive regex for it.

包装起来,这里是一个工作的例子。

Wrapping up, here's a working example.

一些初始化code。

$options = array(
    'WERE' => 1,
    'TYPE' => 'cat',
    'PLURAL' => 1,
    'NAME' => 2
);

$string = 'There {WERE|0=>"was a",1=>"were"} ' .
    '{TYPE}{PLURAL|1=>"s"} named bob' . 
    '{NAME|1=>" and bib",2=>" and alice"}';

和一切融合在一起。

$string = preg_replace_callback('/{([^}]+)}/', function($match) use ($options) {
    $match = $match[1];

    if (false !== $pipe = strpos($match, '|')) {
        $tokens = substr($match, $pipe + 1);
        $match = substr($match, 0, $pipe);
    } else {
        $tokens = array();
    }

    if (isset($options[$match])) {
        if ($tokens) {
            preg_match_all('/(\d)+\s*=>\s*"([^"]*)",?/', $tokens, $tokens);

            $tokens = array_combine($tokens[1], $tokens[2]);

            return $tokens[$options[$match]];
        }
        return $options[$match];
    }
    return '';
}, $string);

请注意,错误检查是最小的,会有意想不到的结果,如果你选择了不存在的选项。

Please note the error checking is minimal, there will be unexpected results if you pick options that don't exist.

有可能是一个简单得多的方式来做到这一切,但我只是把这个想法与它跑了。

There's probably a lot simpler way to do all of this, but I just took the idea and ran with it.