在Scikit Learn中将管道与自定义转换器一起使用
在最后一次线性回归估计器拟合之前,我尝试使用"y"列中的值对"X"列进行转换(这是一个玩具示例,仅用于显示使用 y
进行转换).但是为什么 df ['y']
不传递给 MyTransformer
?
I tried to transform the column 'X' using values in column 'y' (this is a toy example, just to show using y
for transformation) before fitted by the last linear regression estimator. But why df['y']
is not passed to MyTransformer
?
from sklearn.base import TransformerMixin
class MyTransformer(TransformerMixin):
def __init__(self):
pass
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
print(y)
return X + np.sum(y)
df = pd.DataFrame(np.array([[2, 3], [1, 5], [1, 1], [5, 6], [1, 2]]), columns=['X', 'y'])
pip = Pipeline([('my_transformer', MyTransformer()),
('sqrt', FunctionTransformer(np.sqrt, validate=False)),
('lr', LinearRegression())])
pip.fit(df[['X']], df['y'])
运行此脚本将在返回X + np.sum(y)
的行上引发错误,看起来y为 None
.
Running this script will raise an error at line return X + np.sum(y)
, looks like y is None
.
如前所述,fit_transform方法不会传递y进行变换.我之前所做的是实现自己的fit_transform.不是您的代码,但是这是我最近写的一个示例:
As stated previously, the fit_transform method doesn't pass y off to transform. What I've done previously is implement my own fit_transform. Not your code, but here's an example I wrote recently:
class MultiColumnLabelEncoder:
def __init__(self, *args, **kwargs):
self.encoder = StandardLabelEncoder(*args, **kwargs)
def fit(self, X, y=None):
return self
def transform(self,X):
data = X.copy()
for i in range(data.shape[1]):
data[:, i] = LabelEncoder().fit_transform(data[:, i])
return data
def fit_transform(self, X, y=None):
return self.fit(X, y).transform(X)
还有其他方法.您可以将y作为类参数,并在transform方法中对其进行访问.
There are other ways. You could have y as a class param and access it in the transform method.
我应该注意,您可以将y传递给您的transform版本.所以:
I should note that you can pass y off to your version of transform. So:
def fit_transform(self, X, y=None):
return self.fit(X, y).transform(X, y)