sklearn 各种包导入

1、聚类模型

from sklearn.cluster import Kmeans

2、数据集

from sklearn.datasets import load_iris

sklearn标准数据结构

data = [[feature1,feature2,feature3]*nsample]

target = [0,2,,1,2,1,2,0...]

3、特征选择 用于筛选特征

from sklearn.feature_selection import SelectKBest

from sklearn.feature_selection import chi2

fs = SelectKBest(chi2,k=10)

4、预处理

from sklearn.preprocessing import LabelEncoder, LabelBinarizer

Binarizer

4、模型评估、选择

from sklearn.model_selection import KFold

5、模型评估

from sklearn import metrics

y_pred = [0,2,1,3]

y_true = [0,1,2,3]

metrics.accuracy_score(y_true, y_pred)

0.5

metrics.accuracy_score(y_true, y_pred,normalize=False)

roc_auc_score(Receiver Operating Characteristics(受试者工作特性曲线,也就是说在不同的阈值下,True Positive Rate和False Positive Rate的变化情况))

auc就是曲线下面积,这个数值越高,则分类器越优秀

https://zhuanlan.zhihu.com/p/100059009

https://www.zhihu.com/question/39840928

6、朴素贝叶斯

sklearn.naive_bayes

7、邻近算法

sklearn.neighbors 

8、sklearn.svm  支持向量机

9、sklearn.tree 决策树