如何在sklearn中将交叉验证与自定义估算器结合使用?

问题描述:

我用 fit transform 方法编写了一个自定义估算器类.我能够创建模型,使用模型进行训练和预测.

I have written a custom estimator class with a fit and transform method. I am able to create a model, train and predict using the model.

但是,在进行交叉验证时,我遇到了以下错误: TypeError:无法深度复制此模式对象.

However, while doing cross-validation, I run into this error: TypeError: cannot deepcopy this pattern object.

CustomEstimator 的外观如下:

class DefaultEstimator(BaseEstimator, TransformerMixin):
    def __init__(self, preprocessor, pipelines):
      self.preprocessor = preprocessor
      self.pipelines = pipelines

    def fit(self, X, y=None):
      for each_pipeline in self.pipelines:
          each_pipeline.fit(self.preprocessor.apply(X), y)
      return self

   def transform(self, X):
     transformed_data = []
     for each_pipeline in self.pipelines:
        transformed_data.append(each_pipeline.transform(self.preprocessor.apply(X)))
     return sp.hstack(transformed_data)

有人有解决此问题的想法吗?

Does anyone have an idea on approaching this issue?

正如一些评论所建议的,此错误是因为 self.processor 无法深度克隆.

As suggested in few comments, this error is because self.processor can't be deep-cloned.

因此,此错误的解决方法是从此类中删除预处理步骤,并将其作为独立的预处理步骤或在管道本身内部移动.

So, the workaround for this error is to remove preprocessing step from this class and move it as independent preprocessing step or inside the pipeline itself.