Python:如何在用于多标签类的SVM文本分类器算法中找到准确性结果
machine-learning
python
scikit-learn
svm
10
0

我使用了以下代码集:而且我需要检查X_train和X_test的准确性

以下代码在多标签类的分类问题中对我有用

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier

X_train = np.array(["new york is a hell of a town",
                    "new york was originally dutch",
                    "the big apple is great",
                    "new york is also called the big apple",
                    "nyc is nice",
                    "people abbreviate new york city as nyc",
                    "the capital of great britain is london",
                    "london is in the uk",
                    "london is in england",
                    "london is in great britain",
                    "it rains a lot in london",
                    "london hosts the british museum",
                    "new york is great and so is london",
                    "i like london better than new york"])
y_train = [[0],[0],[0],[0]
            ,[0],[0],[1],[1]
            ,[1],[1],[1],[1]
            ,[2],[2]]
X_test = np.array(['nice day in nyc',
                   'the capital of great britain is london',
                   'i like london better than new york',
                   ])   
target_names = ['Class 1', 'Class 2','Class 3']

classifier = Pipeline([
    ('vectorizer', CountVectorizer(min_df=1,max_df=2)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))

输出值

nice day in nyc => Class 1
the capital of great britain is london => Class 2
i like london better than new york => Class 3

我想检查“培训”和“测试数据集”之间的准确性。评分功能不适用于我,它显示一条错误,指出无法接受多标签值

>>> classifier.score(X_train, X_test)

NotImplementedError:多标签分类器不支持得分

请帮助我获得训练和测试数据的准确性结果,并为我们的分类案例选择一种算法。

参考资料:
Stack Overflow
收藏
评论
共 1 个回答
高赞 时间 活跃

如果要获得测试集的准确性得分,则需要创建一个答案键,可以将其称为y_test 。除非您知道正确的答案,否则您无法知道自己的预测是否正确。

有了答案键后,您就可以获取准确性。所需的方法是sklearn.metrics.accuracy_score

我在下面写了出来:

from sklearn.metrics import accuracy_score

# ... everything else the same ...

# create an answer key
# I hope this is correct!
y_test = [[1], [2], [3]]

# same as yours...
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)

# get the accuracy
print accuracy_score(y_test, predicted)

此外,sklearn除了准确性外还有其他几个指标。在此处查看它们: sklearn.metrics

收藏
评论
新手导航
  • 社区规范
  • 提出问题
  • 进行投票
  • 个人资料
  • 优化问题
  • 回答问题

关于我们

常见问题

内容许可

联系我们

@2020 AskGo
京ICP备20001863号