对于GridSearchCV的工作感到困惑

对于GridSearchCV的工作感到困惑

问题描述:

GridSearchCV实现了fit方法,在其中执行n折交叉验证以确定最佳参数.此后,我们可以使用predict()将最佳估计量直接应用于测试数据-此链接:-

GridSearchCV implements a fit method in which it performs n-fold cross validation to determine best parameters. After this we can directly apply the best estimator to the testing data using predict() - Following this link : - http://scikit-learn.org/stable/auto_examples/grid_search_digits.html

在这里说该模型是在完整的开发集上训练的"

It says here "The model is trained on the full development set"

但是,我们在这里只应用了n次交叉验证.分类器是否也在某种程度上对整个数据进行了训练?还是在应用预测时只是在n折中选择训练有素,参数最优的估计器?

However we have only applied n fold cross validations here. Is the classifier somehow also training itself on the entire data? or is it just choosing the best trained estimator with best parameters amongst the n-folds when applying predict?

如果要使用predict,则需要将'refit'设置为True.从文档中:

If you want to use predict, you'll need to set 'refit' to True. From the documentation:

refit : boolean
    Refit the best estimator with the entire dataset. 
    If "False", it is impossible to make predictions using 
    this GridSearchCV instance after fitting.

默认情况下看起来是真的,因此在示例中,predict是基于整个训练集的.

It looks like it is true by default, so in the example, predict is based on the whole training set.