sklearn 中的网格搜索交叉验证

问题描述:

可以使用网格搜索交叉验证来提取决策树分类器的最佳参数吗?http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html

Can grid-search-cross-validation be used to extract best parameters with Decision Tree classifier? http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html

为什么不?

我邀请您查看 GridsearchCV.

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score

param_grid = {'max_depth': np.arange(3, 10)}

tree = GridSearchCV(DecisionTreeClassifier(), param_grid)

tree.fit(xtrain, ytrain)
tree_preds = tree.predict_proba(xtest)[:, 1]
tree_performance = roc_auc_score(ytest, tree_preds)

print 'DecisionTree: Area under the ROC curve = {}'.format(tree_performance)

并提取最佳参数:

tree.best_params_
Out[1]: {'max_depth': 5}