如何保存和加载在Apache的星火MLLib模型

问题描述:

我训练中的Apache星火分类模型( pyspark 使用)。我存储在模型对象, LogisticRegressionModel 。现在,我想就新的数据predictions。我想存储的模式,并在为了使predictions读回到一个新的程序。任何想法如何存储模式?我在想,也许泡菜,但我是一个新手,Python和星火,所以我想听听市民的想法。

I trained a classification model in Apache Spark (using pyspark). I stored the model in an object, LogisticRegressionModel. Now, I want to make predictions on new data. I would like to store the model, and read it back into a new program in order to make the predictions. Any idea how to store the model? I'm thinking of maybe pickle, but I'm a newbie to both python and Spark, so I'd like to hear what the community thinks.

更新:我还需要一个决策树分类。读它,我需要进口DecisionTreeModel 从pyspark.mllib.tree进口决策树,DecisionTreeModel

UPDATE: I also needed a decision tree classifier. To read it, I needed to import DecisionTreeModel from pyspark.mllib.tree import DecisionTree, DecisionTreeModel

您可以通过使用保存模型的保存 mllib 模型方法。

You can save your model by using the save method of mllib models.

# let lrm be a LogisticRegression Model
lrm.save(sc, "lrm_model.model")

存放后,您可以在其他应用程序加载它。

After storing it you can load it in another application.

sameModel = LogisticRegressionModel.load(sc, "lrm_model.model")

正如@ zero323如前所述,还有另一种方式来实现这一点,是通过使用 predictive模型标记语言(PMML)

是数据挖掘小组开发了一种基于XML的文件格式,为数据挖掘和机器学习算法产生的应用程序来描述和交换模型提供了一种方法。

is an XML-based file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms.