在动词/名词/形容词形式之间转换单词
nlp
nltk
python
wordnet
4
0

我想要一个python库函数,可跨语音的不同部分进行翻译/转换。有时它应该输出多个单词(例如“ coder”和“ code”都是动词“ to code”的名词,一个是主语,另一个是宾语)

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

我主要关心动词<=>名词,对于我想编写的记笔记程序。即我可以写“咖啡因拮抗A1”或“咖啡因是A1拮抗剂”,使用一些NLP可以弄清楚它们的含义相同。 (我知道这并不容易,而且将需要NLP进行解析,而不仅仅是标记,但我想破解一个原型)。

类似的问题... 将形容词和副词转换为名词形式 (此答案仅源于词根POS。我想在POS之间进行搜索。)

ps在语言学中称为转换http://en.wikipedia.org/wiki/Conversion_%28linguistics%29

参考资料:
Stack Overflow
收藏
评论
共 4 个回答
高赞 时间 活跃

这是一种启发式方法。我刚刚对其进行了编码,以便为样式使用代码。它使用来自wordnet的derivationally_related_forms()。我已经实现名词化。我猜verbify工作类似。根据我的测试,效果很好:

from nltk.corpus import wordnet as wn

def nounify(verb_word):
    """ Transform a verb to the closest noun: die -> death """
    verb_synsets = wn.synsets(verb_word, pos="v")

    # Word not found
    if not verb_synsets:
        return []

    # Get all verb lemmas of the word
    verb_lemmas = [l for s in verb_synsets \
                   for l in s.lemmas if s.name.split('.')[1] == 'v']

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) \
                                    for l in    verb_lemmas]

    # filter only the nouns
    related_noun_lemmas = [l for drf in derivationally_related_forms \
                           for l in drf[1] if l.synset.name.split('.')[1] == 'n']

    # Extract the words from the lemmas
    words = [l.name for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result
收藏
评论

一种方法可能是使用带有POS标签的单词字典和字形映射。如果您获得或创建了这样的字典(如果您可以访问任何常规字典的数据,这是完全有可能的,因为所有字典都列出了单词的POS标签以及所有派生形式的基本形式),则可以使用以下类似的内容:

def is_verb(word):
    if word:
        tags = pos_tags(word)
        return 'VB' in tags or 'VBP' in tags or 'VBZ' in tags \
               or 'VBD' in tags or 'VBN' in tags:

def verbify(word):
    if is_verb(word):
        return word
    else:
       forms = []
       for tag in pos_tags(word):
           base = word_form(word, tag[:2])
           if is_verb(base):
              forms.append(base)
       return forms
收藏
评论

这里是一个函数,在理论上能够转换名词/动词/形容词/副词形式,我是从更新之间的话这里 (最初由书面沼泽 ,我相信)是符合NLTK 3.2.5现在synset.lemmassysnset.name是函数。

from nltk.corpus import wordnet as wn

# Just to make it a bit more readable
WN_NOUN = 'n'
WN_VERB = 'v'
WN_ADJECTIVE = 'a'
WN_ADJECTIVE_SATELLITE = 's'
WN_ADVERB = 'r'


def convert(word, from_pos, to_pos):    
    """ Transform words given from/to POS tags """

    synsets = wn.synsets(word, pos=from_pos)

    # Word not found
    if not synsets:
        return []

    # Get all lemmas of the word (consider 'a'and 's' equivalent)
    lemmas = []
    for s in synsets:
        for l in s.lemmas():
            if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                lemmas += [l]

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas]

    # filter only the desired pos (consider 'a' and 's' equivalent)
    related_noun_lemmas = []

    for drf in derivationally_related_forms:
        for l in drf[1]:
            if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                related_noun_lemmas += [l]

    # Extract the words from the lemmas
    words = [l.name() for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w)) / len_words) for w in set(words)]
    result.sort(key=lambda w:-w[1])

    # return all the possibilities sorted by probability
    return result


convert('direct', 'a', 'r')
convert('direct', 'a', 'n')
convert('quick', 'a', 'r')
convert('quickly', 'r', 'a')
convert('hunger', 'n', 'v')
convert('run', 'v', 'a')
convert('tired', 'a', 'r')
convert('tired', 'a', 'v')
convert('tired', 'a', 'n')
convert('tired', 'a', 's')
convert('wonder', 'v', 'n')
convert('wonder', 'n', 'a')

正如您在下面看到的那样,它并不是很好。它无法在形容词形式和副词形式之间切换(我的特定目标),但在其他情况下确实会产生一些有趣的结果。

>>> convert('direct', 'a', 'r')
[]
>>> convert('direct', 'a', 'n')
[('directness', 0.6666666666666666), ('line', 0.3333333333333333)]
>>> convert('quick', 'a', 'r')
[]
>>> convert('quickly', 'r', 'a')
[]
>>> convert('hunger', 'n', 'v')
[('hunger', 0.75), ('thirst', 0.25)]
>>> convert('run', 'v', 'a')
[('persistent', 0.16666666666666666), ('executive', 0.16666666666666666), ('operative', 0.16666666666666666), ('prevalent', 0.16666666666666666), ('meltable', 0.16666666666666666), ('operant', 0.16666666666666666)]
>>> convert('tired', 'a', 'r')
[]
>>> convert('tired', 'a', 'v')
[]
>>> convert('tired', 'a', 'n')
[('triteness', 0.25), ('banality', 0.25), ('tiredness', 0.25), ('commonplace', 0.25)]
>>> convert('tired', 'a', 's')
[]
>>> convert('wonder', 'v', 'n')
[('wonder', 0.3333333333333333), ('wonderer', 0.2222222222222222), ('marveller', 0.1111111111111111), ('marvel', 0.1111111111111111), ('wonderment', 0.1111111111111111), ('question', 0.1111111111111111)]
>>> convert('wonder', 'n', 'a')
[('curious', 0.4), ('wondrous', 0.2), ('marvelous', 0.2), ('marvellous', 0.2)]

希望这可以节省一些麻烦

收藏
评论

我知道这并不能回答您的全部问题,但是可以回答大部分问题。我会查看http://nodebox.net/code/index.php/Linguistics#verb_conjugation这个python库能够使动词缀合,并识别单词是动词,名词还是形容词。

示例代码

print en.verb.present("gave")
print en.verb.present("gave", person=3, negate=False)
>>> give
>>> gives

它还可以对单词进行分类。

print en.is_noun("banana")
>>> True

下载位于链接的顶部。

收藏
评论
新手导航
  • 社区规范
  • 提出问题
  • 进行投票
  • 个人资料
  • 优化问题
  • 回答问题

关于我们

常见问题

内容许可

联系我们

@2020 AskGo
京ICP备20001863号