从python中的句子中删除非英语单词

从python中的句子中删除非英语单词

问题描述:

我写了一个代码,该代码将查询发送给Google并返回结果.我从这些结果中提取摘要(摘要)以进行进一步处理.但是,有时这些片段中包含非英语单词,我不希望它们.例如:

I have written a code which sends queries to Google and returns the results. I extract the snippets(summaries) from these results for further processing. However, sometime non-english words are in these snippets which I don't want them. for example:

/\u02b0w\u025bn w\u025bn unstressed \u02b0w\u0259n w\u0259n/ 

我只想在这句话中加上不加强调"的字眼. 我怎样才能做到这一点? 谢谢

I only want the "unstressed" word in this sentence. How can I do that? thanks

PyEnchant对您来说可能是一个简单的选择.我不知道它的速度,但是您可以执行以下操作:

PyEnchant might be a simple option for you. I do not know about its speed, but you can do things like:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>>

此处找到了一个教程,它也有一些选项返回建议,您可以再次为其他查询或其他内容提供建议.另外,您可以检查结果是否为latin-1(is_utf8()事实,不知道is_latin-1()是否也如此,也许使用类似

A tutorial is found here, it also has options to return suggestions which you can you again for another query or something. In addition you can check if your result is in latin-1 (is_utf8() excists, do not know if is_latin-1() does also, maybe use something like Enca which detects the encoding of text files, on the basis of knowledge of their language.)