计算字符串中的单词数(不仅是拉丁语)
If I am not wrong Chinese language (and other languages) doesn't use space ' '
as word delimiter.
So which could be a good algorithm that works internationally?
如果我没有错,中文(和其他语言)不使用空格 那么这可能是一个在国际上有效的好算法? p>
div>'' 代码>作为单词分隔符。 p>
The technique I've seen used a lot is to simply count the number of characters used and divide this by the average characters per word in Chinese. A number that is often used for this is 1.5
If your Chinese text has 1500 characters, it's approximately 1000 words long.
I am not aware of a more accurate way of counting words, except for interpreting the text itself. This would mean actually understanding the context of the words used, since a Chinese character can sometimes be used as a word by itself, but also as a component in a composite word.