scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines

scikit-learn（工程中用的相对较多的模型介绍）：1.4. Support Vector Machines

参考：http://scikit-learn.org/stable/modules/svm.html

在实际项目中，我们真的很少用到那些简单的模型，比如LR、kNN、NB等，虽然经典，但在工程中确实不实用。

今天我们关注在工程中用的相对较多的SVM。

SVM功能不少：Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

好处多多：高维空间的高效率；维度大于样本数的有效性；仅使用训练点的子集（称作支持向量），空间占用少；有不同的kernel functions供选择。

也有坏处：维度大于样本数的有效性----但维度如果相对样本数过高，则效果会非常差；不能直接提供概率估计，需要通过an expensive five-fold cross-validation (see Scores and probabilities, below).才能实现。

（SVM支持dense和sparse sample vectors，但是如果预测使用的sparse data，那训练也要使用稀疏数据。为了发挥SVM效用，请use C-ordered numpy.ndarray (dense) or scipy.sparse.csr_matrix (sparse) with dtype=float64.）

1、分类

SVC, NuSVC and LinearSVC 是三个可以进行multi-class分类的模型。三者的本质区别就是 have different mathematical formulations，具体参考本文最后的公式。

SVC, NuSVC and LinearSVC 和其他分类器一样，使用fit、predict方法：

>>> from sklearn import svm
>>> X = [[0, 0], [1, 1]]
>>> y = [0, 1]
>>> clf = svm.SVC()
>>> clf.fit(X, y)  
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)

After being fitted, the model can then be used to predict new values:

>>>
>>> clf.predict([[2., 2.]])
array([1])

SVM中的支持向量的相关属性可以使用 support_vectors_, support_ and n_support来获取：

>>> # get support vectors
>>> clf.support_vectors_
array([[ 0.,  0.],
       [ 1.,  1.]])
>>> # get indices of support vectors
>>> clf.support_ 
array([0, 1]...)
>>> # get number of support vectors for each class
>>> clf.n_support_ 
array([1, 1]...)

对于multi-class分类：

SVC and NuSVC 的机制是“one-against-one”（training n_class * (n_class - 1) / 2个 models），而 LinearSVC 的策略是“one-vs-the-rest”（training n_class个 models）。而实践中，one-vs-rest是常用和较好的，因为结果其实差不多，但时间省好多。。。

[python] view
 plaincopy

>>> X = [[0], [1], [2], [3]]  

>>> Y = [0, 1, 2, 3]  

>>> clf = svm.SVC()  

>>> clf.fit(X, Y)   

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,  

gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,  

shrinking=True, tol=0.001, verbose=False)  

>>> dec = clf.decision_function([[1]])  

>>> dec.shape[1] # 4 classes: 4*3/2 = 6  

6  

>>> lin_clf = svm.LinearSVC()  

>>> lin_clf.fit(X, Y)   

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,  

     intercept_scaling=1, loss='squared_hinge', max_iter=1000,  

     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,  

     verbose=0)  

>>> dec = lin_clf.decision_function([[1]])  

>>> dec.shape[1]  

4

关于样本所属类别的confidence：The SVC method decision_function gives per-class scores for each sample。另外还有所谓的option probability，但是，If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.（主要是因为probability的理论背景有缺陷）

在每个class或者sample的权重不同的情况下，可以设置keywords class_weight andsample_weight ：

类别权重：SVC (but not NuSVC) implement a keyword class_weight in the fit method. It’s a dictionary of the form {class_label : value}, where value is a floating point number > 0 that sets the parameter C of class class_label to C * value.

样本权重：SVC, NuSVC, SVR, NuSVR and OneClassSVM implement also weights for individual samples in method fit through keyword sample_weight. Similar to class_weight, these set the parameter C for the i-th example to C * sample_weight[i].

最后给几个例子：

Plot different SVM classifiers in the iris dataset,
SVM: Maximum margin separating hyperplane,
SVM: Separating hyperplane for unbalanced classes
SVM-Anova: SVM with univariate feature selection,
Non-linear SVM
SVM: Weighted samples,

2、回归

Support Vector Regression.

看能明白这句话不能：Analogously（to SVClassfication）, the model produced by Support Vector Regression depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.

同样也是三个模型： SVR, NuSVR and LinearSVR。

>>> from sklearn import svm
>>> X = [[0, 0], [2, 2]]
>>> y = [0.5, 2.5]
>>> clf = svm.SVR()
>>> clf.fit(X, y) 
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=0.0,
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.5])

给个例子：

Support Vector Regression (SVR) using linear and non-linear kernels

3、Density estimation，novelty detection（密度估计、新颖性检测）

先看下wiki上怎么说Novelty detection：Novelty detection is the identification of new or unknown data that a machine learning system has not been trained with and was not previously aware of,^[1] with the help of either statistical or machine learning based approaches.

OneClassSVM is used for novelty detection, that is, given a set of samples, it will detect the soft boundary of that set so as to classify new points as belonging to that set or not. 过程是无监督的，所以输入只有X。

具体详细应用参考：section Novelty and Outlier Detection 。

最后给出两个例子：

One-class SVM with non-linear kernel (RBF)
Species distribution modeling

4、复杂度

The QP（quadratic programming problem） solver used by this libsvm-based implementation scales between $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ and $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ depending on how efficiently the libsvm cache is used in practice (dataset dependent).

5、实际应用中的一些小tips

Avoid data copy；kernel cache size；

Setting C：C默认是1，但是如果data中有很多noisy observations，需要减小C；

it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results.

在 SVC中，如果数据样本unbalanced，set class_weight='auto' and/or try different penalty parameters C.

6、kernel function

使用方式为：svm.SVC(kernel='linear')，常见的kernel有：

linear: $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ .
polynomial: $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ . $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ is specified by keyword degree, $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ by coef0.
rbf: $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ . $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ is specified by keyword gamma, must be greater than 0.
sigmoid ( $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ ), where $scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines$ is specified by coef0.

也可自定义kernel，例如：

>>> import numpy as np
>>> from sklearn import svm
>>> def my_kernel(x, y):
...     return np.dot(x, y.T)
...
>>> clf = svm.SVC(kernel=my_kernel)

SVM with custom kernel.

7、Mathematical formulation

1、SVC：

scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines

2、SVR：

scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines

scikit-learn（工程顶用的相对较多的模型介绍）：1.4. Support Vector Machines

相关推荐