


我使用 Text::Ngrams 来确定字符串中的单词组合.但是,我需要保留其中包含数字的单词.我已经确定 $o->{tokenrex} 是我需要修改的,但我无法确定合适的正则表达式.

I'm using Text::Ngrams to determine the word combinations in a string. However, I need to keep words that have digits in them. I've determined that $o->{tokenrex} is what I need to modify, but I can't determine the proper regex for it.

原文是qr/([a-zA-Z]+|(\d+(\.\d+)?|\d*\.\d+)([eE][-+]?\d+)?)/; 但我想我需要更多类似的东西:

The original is qr/([a-zA-Z]+|(\d+(\.\d+)?|\d*\.\d+)([eE][-+]?\d+)?)/; but I'm thinking I need something more along the lines of this:



Which should, if I'm reading regex right, match any number of alpha characters, or a "number" that has a word character before and after it, or a "number". Except that it's splitting up my "word" into separate tokens. The example word I'm working with is "A1X".



Y'all are making this way too complicated. The original regex matches words made of letters only or numbers (integers, floating point including exponential notation).

如果您需要匹配由字母和数字组成的单词,则其正则表达式为 [a-zA-Z\d]+.根据模块文档,您还需要指定要跳过的内容,并且匹配 [^a-zA-Z\d]+.

If you need to match words made of letters and numbers, then the regex for that is [a-zA-Z\d]+. Per the module docs, you'll also want to specify what to skip, and that matches [^a-zA-Z\d]+.

$self->{tokenrex} = qr/([a-z\d]+)/i;
$self->{skiprex}  = qr/([^a-z\d]+)/i;


If you need to recognize numbers as the module documentation shows in its example, then please let me know, and I'll be happy to add that back in for you. From your description, that doesn't sound like what you need.