如何使用Python NLTK在WordNet同义词集中仅打印单词本身?
Python 2.7中是否有一种方法可以使用NLTK
来获取单词,而不使用包含"synset"
和括号以及"n.01"
等的额外格式?
Is there a way in Python 2.7 using NLTK
to just get the word and not the extra formatting that includes "synset"
and the parentheses and the "n.01"
etc?
例如,如果我这样做
wn.synsets('dog')
我的结果如下:
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
我该如何获取这样的列表?
How can I instead get a list like this?
dog
frump
cad
frank
pawl
andiron
chase
是否可以使用NLTK
来执行此操作,还是必须使用regular expressions
?我可以在python脚本中使用regular expressions
吗?
Is there a way to do this using NLTK
or do I have to use regular expressions
? Can I use regular expressions
within a python script?
尝试一下:
for synset in wn.synsets('dog'):
print synset.lemmas[0].name
您要遍历狗的每个同义词集,然后打印出该同义词集的标题.请记住,多个单词可以附加到同一个同义词集,因此,如果要获取与dog的所有同义词集相关联的所有单词,则可以执行以下操作:
You want to iterate over each synset for dog, and then print out the headword of the synset. Keep in mind that multiple words could attach to the same synset, so if you want to get all the words associated with all the synsets for dog, you could do:
for synset in wn.synsets('dog'):
for lemma in synset.lemmas:
print lemma.name