I have been using JPMML for running algorithms over distributed platforms like Storm, Spark etc. but I haven't been able to identify, how to use JPMML for producing my own PMML model files for specific data sets other than the ones in jpmml-rattle project(audit, iris and ozone data-sets).
Lets say I have an entirely different data-set of the following form:
"Plane","XCoordinate","YCoordinate"
0.0,0.7800144346305873,1.6512542456242612
1.0,3.3192955924982677,4.664828345688715
0.0,-0.9059493298933676,-0.42207747354389447
1.0,3.1776956110847916,1.1393123509452483
0.0,-0.5246202787832872,1.0246845701853746
and so on, wish to know how can I generate a PMML model that can run a Naive Bayes classifier on this data-set?
I think I am missing something, can somebody provide me some pointers on this?
Also, does anyone know apart from Augustus which other tools support generation of PMML models files?
Regards,
Jayati
Thanks so much for the detailed reply. It really helped.
Wanted to know, how good an idea it would be to have a library of:
1. Converters of library specific non-pmml models produced by various ML Libraries like (Mahout, Weka, some Python libraries like Milk PyBrain, MLPY etc.) to PMML Models.
2. Vice-versa converters that can convert PMML models to library specific models like XYZ.model file for Mahout .. etc
Can you please suggest ?
Regards,
Jayati
Thanks a ton. Your prompt replies are really appreciable.
Having PMML producers for Spark and Mahout sound very interesting. I guess Weka already has the support.
I would start researching on these lines. I have done a bit of work on Spark, so I might wanna start with that first.
Other than that I would also check if Jython can work for the Python MLLibs, if yes, all of those can be extended to add PMML producer utility.
Thanks again.
Regards,
Jayati