具有多种功能的培训模型,其价值观在概念上是相同的

问题描述:

例如,假设我正在尝试训练一个采用以下形式的样本输入的二进制分类器

For example, say I am trying to train a binary classifier that takes sample inputs of the form

x = {d=(type of desk), p1=(type of pen on desk), p2=(type of *another* pen on desk)}

说我然后根据样本训练模型:

Say I then train a model on the samples:

x1 = {wood, ballpoint, gel},      y1 = {0}

x2 = {wood, ballpoint, ink-well}, y2 = {1}.

,并尝试预测新样本:x3 = {wood, gel, ballpoint}.在这种情况下,我希望的响应是y3 = {0},因为从概念上讲,哪个笔指定为p1或p2无关紧要(即,我不想让它重要).

and try to predict on the new sample: x3 = {wood, gel, ballpoint}. The response that I am hoping for in this case is y3 = {0}, since conceptually it should not matter (ie. I don't want it to matter) which pen is designated as p1 or p2.

在尝试运行此模型时(在我的情况下,使用h2o.ai生成的模型),我收到以下错误:p2的类别枚举无效(因为该模型从未在p2在训练过程中的类别)(在h2o中: hex.genmodel.easy.exception.PredictUnknownCategoricalLevelException )

When trying to run this model (in my case, using an h2o.ai generated model), I get the error that the category enum for p2 is not valid (since the model has never seen 'ballpoint' in p2's category during training) (in h2o: hex.genmodel.easy.exception.PredictUnknownCategoricalLevelException)

我的第一个想法是为每个样本生成笔"特征的排列以训练模型.有没有更好的方法来处理这种情况?具体来说,在h2o.ai Flow UI解决方案中,因为这就是我用来构建模型的方法.谢谢.

My first idea was to generate permutations of the 'pens' features for each sample to train the model on. Is there a better way to handle this situation? Specifically, in h2o.ai Flow UI solution, since that is what I am using to build the model. Thanks.

H2O二进制模型(在H2O集群中运行的模型)将自动处理看不见的分类级别,但是,当您使用纯Java POJO模型方法生成预测时(例如您的情况),这是一个可配置的选项.在EasyPredictModelWrapper中,默认行为是未知类别级别抛出 PredictUnknownCategoricalLevelException ,这就是为什么看到此错误的原因.

H2O binary models (models running in the H2O cluster) will handle unseen categorical levels automatically, however, in when you are generating predictions using the pure Java POJO model method (like in your case), this is a configurable option. In the EasyPredictModelWrapper, the default behavior is that unknown categorical levels throw PredictUnknownCategoricalLevelException, which is why you are seeing that error.

EasyPredictModelWrapper上有更多有关此信息Javadocs . 这是一个示例:

用于生成的POJO和MOJO模型的简单预测API.用法如下: 1.实例化EasyPredictModelWrapper 2.创建新的数据行 3.调用一种预测方法

The easy prediction API for generated POJO and MOJO models. Use as follows: 1. Instantiate an EasyPredictModelWrapper 2. Create a new row of data 3. Call one of the predict methods

这里是一个例子:

// Step 1.
modelClassName = "your_pojo_model_downloaded_from_h2o";
GenModel rawModel;
rawModel = (GenModel) Class.forName(modelClassName).newInstance();

EasyPredictModelWrapper model = new EasyPredictModelWrapper(
                                    new EasyPredictModelWrapper.Config()
                                        .setModel(rawModel)
                         .setConvertUnknownCategoricalLevelsToNa(true));

// Step 2.
RowData row = new RowData();
row.put(new String("CategoricalColumnName"), new String("LevelName"));
row.put(new String("NumericColumnName1"), new String("42.0"));
row.put(new String("NumericColumnName2"), new Double(42.0));

// Step 3.
BinomialModelPrediction p = model.predictBinomial(row);