计算字符串中的单词数（不仅是拉丁语）

问题描述：

If I am not wrong Chinese language (and other languages) doesn't use space ' ' as word delimiter.

So which could be a good algorithm that works internationally?

如果我没有错，中文（和其他语言）不使用空格'' 代码>作为单词分隔符。 p>

那么这可能是一个在国际上有效的好算法？ p> div>

答

The technique I've seen used a lot is to simply count the number of characters used and divide this by the average characters per word in Chinese. A number that is often used for this is 1.5

If your Chinese text has 1500 characters, it's approximately 1000 words long.

I am not aware of a more accurate way of counting words, except for interpreting the text itself. This would mean actually understanding the context of the words used, since a Chinese character can sometimes be used as a word by itself, but also as a component in a composite word.

计算字符串中的单词数（不仅是拉丁语）

相关推荐