在python中使用阿拉伯语WordNet作为同义词?
我正在尝试获取句子中阿拉伯语单词的同义词
I am trying to get the synonyms for arabic words in a sentence
如果这个词是英文的,效果很好,结果用阿拉伯语显示,我想知道是否有可能不先用英文写出一个阿拉伯词的同义词.
If the word is in English it works perfectly, and the results are displayed in Arabic language, I was wondering if its possible to get the synonym of an Arabic word right away without writing it in english first.
我试过了,但没有用 &我宁愿没有 tashkeel انتظار 而不是 اِنْتِظار
I tried that but it didn't work & I would prefer without tashkeel انتظار instead of اِنْتِظار
from nltk.corpus import wordnet as omw
jan = omw.synsets('انتظار ')[0]
print(jan)
print(jan.lemma_names(lang='arb'))
nltk 中使用的 Wordnet 不支持阿拉伯语.如果您正在寻找 Arabic Wordnet 那么这是完全不同的事情.
Wordnet used in nltk doesnt support arabic. If you are looking for Arabic Wordnet so this is a totally different thing.
对于阿拉伯语 wordnet,下载:
For Arabic wordnet, download:
你运行它:
$ python AWNDatabaseManagement.py -i upc_db.xml
现在得到类似 wn.synset('إنتظار')
的东西.阿拉伯语 Wordnet 有一个函数 wn.get_synsets_from_word(word)
,但它给出了偏移量.它也只接受在数据库中发声的单词.例如,您应该将 جَمِيل
用于 جميل
:
Now to get something like wn.synset('إنتظار')
. Arabic Wordnet has a function wn.get_synsets_from_word(word)
, but it gives offsets. Also it accepts the words only as vocalized in the database. For example, you should use جَمِيل
for جميل
:
>> wn.get_synsets_from_word(u"جَمِيل")
[(u'a', u'300218842')]
300218842
是 جميل 同义词集的偏移量.
300218842
is the offset of the synset of جميل .
我检查了 إنتظار 这个词,似乎它在 AWN 中不存在.
I checked for the word إنتظار and seems it doesn't exist in AWN.
有关使用 AWN 获取同义词的更多详细信息此处.
More details about using AWN to get synonyms here.