如何使用Python NLTK在WordNet同义词集中仅打印单词本身?

问题描述:

Python 2.7中是否有一种方法可以使用NLTK来获取单词,而不使用包含"synset"和括号以及"n.01"等的额外格式?

Is there a way in Python 2.7 using NLTK to just get the word and not the extra formatting that includes "synset" and the parentheses and the "n.01" etc?

例如,如果我这样做

        wn.synsets('dog')

我的结果如下:

[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

我该如何获取这样的列表?

How can I instead get a list like this?

dog
frump
cad
frank
pawl
andiron
chase

是否可以使用NLTK来执行此操作,还是必须使用regular expressions?我可以在python脚本中使用regular expressions吗?

Is there a way to do this using NLTK or do I have to use regular expressions? Can I use regular expressions within a python script?

尝试一下:

for synset in wn.synsets('dog'):
    print synset.lemmas[0].name

您要遍历狗的每个同义词集,然后打印出该同义词集的标题.请记住,多个单词可以附加到同一个同义词集,因此,如果要获取与dog的所有同义词集相关联的所有单词,则可以执行以下操作:

You want to iterate over each synset for dog, and then print out the headword of the synset. Keep in mind that multiple words could attach to the same synset, so if you want to get all the words associated with all the synsets for dog, you could do:

for synset in wn.synsets('dog'):
    for lemma in synset.lemmas:
        print lemma.name