计算字符串中的单词数(不仅是拉丁语)

问题描述:

If I am not wrong Chinese language (and other languages) doesn't use space ' ' as word delimiter.

So which could be a good algorithm that works internationally?

如果我没有错,中文(和其他语言)不使用空格'' 代码>作为单词分隔符。 p>

那么这可能是一个在国际上有效的好算法? p> div>

The technique I've seen used a lot is to simply count the number of characters used and divide this by the average characters per word in Chinese. A number that is often used for this is 1.5

If your Chinese text has 1500 characters, it's approximately 1000 words long.

I am not aware of a more accurate way of counting words, except for interpreting the text itself. This would mean actually understanding the context of the words used, since a Chinese character can sometimes be used as a word by itself, but also as a component in a composite word.