Wordnet查找同义词
nlp
wordnet
5
0

我正在寻找一种使用wordnet查找特定单词的所有同义词的方法。我正在使用JAWS。

例如:

爱(v):欣赏,赞美,迷恋,被迷住,为之疯狂,被迷恋,为之着迷,为之着迷,为之着迷,为之着迷,为之着迷,为之着迷,神化,喜悦,宠爱,尊重,崇高,堕落,幻想,荣耀,追求,继续...。

love(n):同义词:称赞,感情,忠诚,友善,多情,爱慕,欣赏,热情,热情,执着,案例*,珍惜,暗恋,喜悦,忠诚,热爱,情感,结界,享受,热情,忠诚,火焰,喜爱,友谊,渴望,偶像崇拜,倾向,迷恋,参与

在一个相关的问题中,用户Ram指出了一些代码,但这并不能满足要求,因为它只能提供截然不同的输出:

爱,激情:任何被爱,亲爱的,亲爱的,亲爱的,充满爱意或热爱的对象:用作爱恋,性爱,色情爱的用语:对性欲和吸引力的深刻感觉:网球或壁球性爱,做爱,做爱,爱,热爱生活中的得分为零:性活动(通常包括性爱)两人之间的性爱:对某人有深厚的感情或喜好

那么我该如何实现呢?wordnet是否适合我想要做的事情?

参考资料:
Stack Overflow
收藏
评论
共 2 个回答
高赞 时间 活跃

首先我们要问“什么是同义词?”,“可以从表面/根词查询同义词吗?”的问题。

在WordNet中,在此术语下,您具有表示相同概念的相似单词,称为“ Synset而不是表面单词级别。

为了在您的示例覆盖范围内获得同义词集的同义词,您需要的不仅是词网,还可能需要一些语义相似性方法来提取其他词。

我无法对您的意思进行JAWS解释,但可以通过NLTK界面中的WordNet进行Python解释。您会看到WN不足以提供所需的覆盖范围。

from nltk.corpus import wordnet as wn
for ss in wn.synsets('love'): # Each synset represents a diff concept.
  print ss.definition
  print ss.lemma_names
  print

上面的代码输出:

a strong positive emotion of regard and affection
['love']

any object of warm affection or devotion; 
['love', 'passion']

a beloved person; used as terms of endearment
['beloved', 'dear', 'dearest', 'honey', 'love']

a deep feeling of sexual desire and attraction
['love', 'sexual_love', 'erotic_love']

a score of zero in tennis or squash
['love']

sexual activities (often including sexual intercourse) between two people
['sexual_love', 'lovemaking', 'making_love', 'love', 'love_life']

have a great affection or liking for
['love']

get pleasure from
['love', 'enjoy']

be enamored or in love with
['love']

have sexual intercourse with
['sleep_together', 'roll_in_the_hay', 'love', 'make_out', 'make_love', 'sleep_with', 'get_laid', 'have_sex', 'know', 'do_it', 'be_intimate', 'have_intercourse', 'have_it_away', 'have_it_off', 'screw', 'fuck', 'jazz', 'eff', 'hump', 'lie_with', 'bed', 'have_a_go_at_it', 'bang', 'get_it_on', 'bonk']
收藏
评论

仅使用WordNet,您可以尝试使用语义相似性来确定两个单词(同义词)是否足够相似以至于可以作为同义词。以下是一个简短的示例,该示例来自使用WordNet修改我的另一个关于语义相似性的答案。

它确实有它的问题:

  • 反义词与同义词混合在一起
  • 太慢了 ! (因为它必须检查所有〜117k同义词集)

尽管如此, lemma_names单独使用lemma_names相比,它产生的同义词更多,因此我将其留在此处以防它可能有用(可能与其他内容结合使用)。

>>> from nltk.corpus import wordnet as wn
>>> def syn(word, lch_threshold=2.26):
    for net1 in wn.synsets(word):
        for net2 in wn.all_synsets():
            try:
                lch = net1.lch_similarity(net2)
            except:
                continue
            # The value to compare the LCH to was found empirically.
            # (The value is very application dependent. Experiment!)
            if lch >= lch_threshold:
                yield (net1, net2, lch)


>>> for x in syn('love'):
    print x

上面的代码输出:

(Synset('love.n.01'), Synset('feeling.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('conditioned_emotional_response.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('emotion.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('worship.n.02'), 2.9444389791664407)
(Synset('love.n.01'), Synset('anger.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('fear.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('fear.n.03'), 2.538973871058276)
(Synset('love.n.01'), Synset('anxiety.n.02'), 2.538973871058276)
(Synset('love.n.01'), Synset('joy.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('love.n.01'), 3.6375861597263857)
(Synset('love.n.01'), Synset('agape.n.02'), 2.9444389791664407)
(Synset('love.n.01'), Synset('agape.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('filial_love.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('ardor.n.02'), 2.9444389791664407)
(Synset('love.n.01'), Synset('amorousness.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('puppy_love.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('devotion.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('benevolence.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('beneficence.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('heartstrings.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('lovingness.n.01'), 2.9444389791664407)
(Synset('love.n.01'), Synset('warmheartedness.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('loyalty.n.02'), 2.9444389791664407)
(Synset('love.n.01'), Synset('hate.n.01'), 2.538973871058276)
(Synset('love.n.01'), Synset('emotional_state.n.01'), 2.538973871058276)
(Synset('love.n.02'), Synset('content.n.05'), 2.538973871058276)
(Synset('love.n.02'), Synset('object.n.04'), 2.9444389791664407)
(Synset('love.n.02'), Synset('antipathy.n.02'), 2.538973871058276)
(Synset('love.n.02'), Synset('bugbear.n.02'), 2.538973871058276)
(Synset('love.n.02'), Synset('execration.n.03'), 2.538973871058276)
(Synset('love.n.02'), Synset('center.n.06'), 2.538973871058276)
(Synset('love.n.02'), Synset('hallucination.n.03'), 2.538973871058276)
(Synset('love.n.02'), Synset('infatuation.n.03'), 2.538973871058276)
(Synset('love.n.02'), Synset('love.n.02'), 3.6375861597263857)
(Synset('beloved.n.01'), Synset('person.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('lover.n.01'), 2.9444389791664407)
(Synset('beloved.n.01'), Synset('admirer.n.03'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('beloved.n.01'), 3.6375861597263857)
(Synset('beloved.n.01'), Synset('betrothed.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('boyfriend.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('darling.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('girlfriend.n.02'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('idolizer.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('inamorata.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('inamorato.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('kisser.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('necker.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('petter.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('romeo.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('soul_mate.n.01'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('squeeze.n.04'), 2.538973871058276)
(Synset('beloved.n.01'), Synset('sweetheart.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('desire.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('sexual_desire.n.01'), 2.9444389791664407)
(Synset('love.n.04'), Synset('love.n.04'), 3.6375861597263857)
(Synset('love.n.04'), Synset('aphrodisia.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('anaphrodisia.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('passion.n.05'), 2.538973871058276)
(Synset('love.n.04'), Synset('sensuality.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('amorousness.n.02'), 2.538973871058276)
(Synset('love.n.04'), Synset('fetish.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('libido.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('lecherousness.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('nymphomania.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('satyriasis.n.01'), 2.538973871058276)
(Synset('love.n.04'), Synset('the_hots.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('bowling_score.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('football_score.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('baseball_score.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('basketball_score.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('number.n.02'), 2.538973871058276)
(Synset('love.n.05'), Synset('score.n.03'), 2.9444389791664407)
(Synset('love.n.05'), Synset('stroke.n.06'), 2.538973871058276)
(Synset('love.n.05'), Synset('birdie.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('bogey.n.02'), 2.538973871058276)
(Synset('love.n.05'), Synset('deficit.n.03'), 2.538973871058276)
(Synset('love.n.05'), Synset('double-bogey.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('duck.n.02'), 2.538973871058276)
(Synset('love.n.05'), Synset('eagle.n.02'), 2.538973871058276)
(Synset('love.n.05'), Synset('double_eagle.n.01'), 2.538973871058276)
(Synset('love.n.05'), Synset('game.n.06'), 2.538973871058276)
(Synset('love.n.05'), Synset('lead.n.07'), 2.538973871058276)
(Synset('love.n.05'), Synset('love.n.05'), 3.6375861597263857)
(Synset('love.n.05'), Synset('match.n.05'), 2.538973871058276)
(Synset('love.n.05'), Synset('par.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('bondage.n.03'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('outercourse.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('safe_sex.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('sexual_activity.n.01'), 2.9444389791664407)
(Synset('sexual_love.n.02'), Synset('conception.n.02'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('sexual_intercourse.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('pleasure.n.05'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('sexual_love.n.02'), 3.6375861597263857)
(Synset('sexual_love.n.02'), Synset('carnal_abuse.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('coupling.n.03'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('reproduction.n.05'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('foreplay.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('perversion.n.02'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('autoeroticism.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('promiscuity.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('lechery.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('homosexuality.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('bisexuality.n.02'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('heterosexuality.n.01'), 2.538973871058276)
(Synset('sexual_love.n.02'), Synset('bestiality.n.02'), 2.538973871058276)
# ...
收藏
评论
新手导航
  • 社区规范
  • 提出问题
  • 进行投票
  • 个人资料
  • 优化问题
  • 回答问题

关于我们

常见问题

内容许可

联系我们

@2020 AskGo
京ICP备20001863号