在php中使用正则表达式查找化学公式的所有实例
I have the following string: "AZS40G is Alumina Zircon Silicate material with ZrO2 content of 39% minimum, which serves as a great substitute in applications for production of sintered AZS refractories and where the Fused Zircon mullite is required. C1R5".
I would like to use regex to find all digits in chemical formulas in the text (Instances of letters preceding numbers, excluding the designates abbreviation i.e. "AZS40G" in this instance and wrap them with a <sub></sub>
tag.
I am doing this all in php and since I do not know where to start with regex, I have provided the following pseudo code/php example:
$text = "AZS40G is Alumina Zircon Silicate material with ZrO2 content of 39% minimum, which serves as a great substitute in applications for production of sintered AZS refractories and where the Fused Zircon mullite is required. Zr5O2, M20R2, C1R5";
preg_replace('/(AZS40G!)(?<=[A-Z])\d+/', '<sub>${1}</sub>', $text);
The expected result would be all instances as follows:
I have the following string: "AZS40G is Alumina Zircon Silicate material with ZrO2 content of 39% minimum, which serves as a great substitute in applications for production of sintered AZS refractories and where the Fused Zircon mullite is required. C1R5".
Use skip/fail to move past the abbreviations.
\b(?:AZS40G|BZS40G|CZS40G)\b(*SKIP)(*FAIL)|(?<=[A-Z])(\d+)
https://regex101.com/r/VglQ3K/1
Expanded
\b
(?: AZS40G | BZS40G | CZS40G ) # exclude the designates abbreviation
\b
(*SKIP) (*FAIL) # Will move the current position past this,
# then fail the match
| # or,
(?<= [A-Z] )
( \d+ ) # (1)
You could use this replacement:
// Extract first word from text, as it must be excluded from the replacement
list($name, $def) = explode(" ", $text, 2);
// Make replacement in the rest
$result = $name . " " . preg_replace("/([A-Z][a-z]?)(\d+)/", "$1<sub>$2</sub>", $def);
Note that element names can end with a lowercase letter, hence the [a-z]?
.
I assume that the first word of the text represents the name that should be excluded from the replacement action. It might be easiest to just pull it from the text, do the replacement, and add it again.