Contribute to compare results between platforms

xavier sumba

unread,

Dec 28, 2015, 11:29:13 AM12/28/15

to OpenML

Hi,

I am a student of Computer Science and I was working in my graduate project. I was trying to execute experiments of data mining and after compare that results (An experiment could be executed in many platforms and with different configurations). For the moment I can execute experiments for clustering and classification in WEKA and Apache Mahout.

So I start to looking for an ontology to show my results and I found out Expo and Exposé. That is how I discovered OpenML and now I think that my work it’s not going to be useful compared with your work. Then I would like to know if there is any way to contribute with you guys based on my approach?

Cheers,
Xavier Sumba.

Bernd Bischl

unread,

Dec 28, 2015, 12:00:05 PM12/28/15

to ope...@googlegroups.com

Hi Xavier,

although I am a bit older, I guess I know the feeling ;-)

We are very open w.r.t. cooperation. OpenML already has many results from WEKA and we will very soon be able to compare to stuff from R.

In order to communicate better about this, can you please describe better what goal you would like to achieve?

Best

Bernd

--
You received this message because you are subscribed to the Google Groups "OpenML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joaquin Vanschoren

unread,

Dec 28, 2015, 12:03:33 PM12/28/15

to Bernd Bischl, ope...@googlegroups.com

Hi Xavier, you are very welcome to cooperate. Please let us know what interests you most.

We have very little results on clustering right now, we could really use some help there, but it totally open and up to you.

Cheers,
Joaquin

xavier sumba

unread,

Jan 2, 2016, 11:59:29 PM1/2/16

to OpenML, bernd_...@gmx.net, j.vans...@tue.nl

Hi all,

First of all sorry for my late answer I was out of the city.

My aim is to organise and share experiments of data mining. An experiment could be executed with a variety of algorithms (algorithms could be added of different platforms ex. WEKA, rapid miner, etc.). Every algorithm could be executed with different configurations. Finally an experiment has an input (basically a dataset) and outputs (one output for every configuration).

I wanted to describe the experiments in an ontology (like Exposé) and save all this information in a RDBMS or in a noSQL DB. After I could export that information in a triple store. Then I could take some advantage of semantic web and linked data. We could learn from experiments.

A user can design an experiment and execute it with a simple click. I have thought to execute experiments locally, instead you execute through API (Thats awesome!).

This days I have been trying to run OpenML [1] and OpenML website [2], I am using Netbeans 8.0.2 as IDE and MAMP. I would like to run your platform to interact with your code. So where should I start?

Cheers,

Xavier.

[1] https://github.com/openml/OpenML
[2] https://github.com/openml/website

Joaquin Vanschoren

unread,

Jan 3, 2016, 4:10:38 PM1/3/16

to xavier sumba, OpenML, bernd_...@gmx.net

Hi Xavier,

It depends a bit on what you want to do.

* Run experiments and store them on OpenML: we have APIs for that, see http://www.openml.org/guide > Plugins and Developers. We also have a brand new RapidMiner API, it would be great if you are interested in using it and stress-testing it.

* Export machine learning experiments as linked open data. There are many people interested in that, so we have a separate initiative for this to find a commonly accepted way of doing this, called ML-Schema: https://github.com/ML-Schema/core. We also started a W3C Community group for this: https://www.w3.org/community/ml-schema/

We have a conference call tomorrow if you are interested in joining.

* 'I would like to run your platform to interact with your code'. What do you want to do explicitly? Do you simply want to program against our API? Or do you want to help extend the software? We host all our code on GitHub: https://github.com/openml. Basically, we have a repo for the server backend (e.g. evaluation engine), the frontend (website), and each of the APIs for different programming languages. Also read the wiki for a general impression of how the code is organized: https://github.com/openml/OpenML/wiki It is a bit outdated but most of it should be correct.

Please let me know what you wish to do :)

Happy new year,

Joaquin

Bernd Bischl

unread,

Jan 3, 2016, 4:53:11 PM1/3/16

to ope...@googlegroups.com

Xavier,

thanks for the clarification. IMHO, what you want to do really sounds very similar to what we have / our goal is. Maybe you would like to focus on a certain part in OpenML, to make that better and join the team for this?
Would that be a reasonable idea?

We really could use more people and help as the project has become quite large and time intensive.

Best

Bernd

xavier sumba

unread,

Jan 4, 2016, 11:44:22 AM1/4/16

to OpenML

Hi,

thanks for the clarification. IMHO, what you want to do really sounds very similar to what we have / our goal is. Maybe you would like to focus on a certain part in OpenML, to make that better and join the team for this?
Would that be a reasonable idea?

We really could use more people and help as the project has become quite large and time intensive.

Yes, definitely. I want to contribute I think this is a really good project.

It depends a bit on what you want to do.

* Run experiments and store them on OpenML: we have APIs for that, see http://www.openml.org/guide > Plugins and Developers. We also have a brand new RapidMiner API, it would be great if you are interested in using it and stress-testing it.

* Export machine learning experiments as linked open data. There are many people interested in that, so we have a separate initiative for this to find a commonly accepted way of doing this, called ML-Schema: https://github.com/ML-Schema/core. We also started a W3C Community group for this: https://www.w3.org/community/ml-schema/

We have a conference call tomorrow if you are interested in joining.

Yes, I would like to assist. What time is it?

* 'I would like to run your platform to interact with your code'. What do you want to do explicitly? Do you simply want to program against our API? Or do you want to help extend the software?

I was trying to make this as my graduate project from college, I have to talk with my tutor. Anyway if the project aims deviate I would like to extend your project and make enough to get my degree. Simultaneously I would like to help to extend the software.

We host all our code on GitHub: https://github.com/openml. Basically, we have a repo for the server backend (e.g. evaluation engine), the frontend (website), and each of the APIs for different programming languages. Also read the wiki for a general impression of how the code is organized: https://github.com/openml/OpenML/wiki It is a bit outdated but most of it should be correct.

Please let me know what you wish to do :)

Thanks. I am reading the wiki, but I am having some troubles with the configuration file BASE_CONFIG.php. Could you explain which are the configuration details for OpenAPI and ElasticSearch server?

Cheers,

XS

Joaquin Vanschoren

unread,

Jan 14, 2016, 6:00:32 AM1/14/16

to xavier sumba, OpenML

Hi Xavier,

Sorry for the slow reply.

The ML-Schema call was already passed when you replied, but the next one will be Jan 18, 13:30: https://github.com/ML-Schema/core/wiki

Any reply from your tutor on which aspects you want to or can work on?

Regarding the local setup: this is ONLY necessary when you want to work on the server or website. The config file is well documented:

https://github.com/openml/website/blob/master/openml_OS/config/BASE_CONFIG-BLANK.php

I'm not sure that you mean with OpenAPI, but you probably mean API_USERNAME etc? These are your login details for OpenML. When you register you have chosen these yourself, and the API key can be found under your profile.

For the ES server, you should set up your own ES server and enter the details. You could use the OpenML ES, but ideally not.

Cheers,

Joaquin

--

xavier sumba

unread,

Jan 14, 2016, 4:14:23 PM1/14/16

to OpenML, cuent...@gmail.com, j.vans...@tue.nl

Hi Joaquin,

Sorry for the slow reply.

The ML-Schema call was already passed when you replied, but the next one will be Jan 18, 13:30: https://github.com/ML-Schema/core/wiki
Any reply from your tutor on which aspects you want to or can work on?

I am still working on local executions, but I want to start contributing with you guys. So tell me what can I do, I would rather to work under Java, because I just know the basics of PHP.

Regarding the local setup: this is ONLY necessary when you want to work on the server or website. The config file is well documented:
https://github.com/openml/website/blob/master/openml_OS/config/BASE_CONFIG-BLANK.php

I'm not sure that you mean with OpenAPI, but you probably mean API_USERNAME etc? These are your login details for OpenML. When you register you have chosen these yourself, and the API key can be found under your profile.

For the ES server, you should set up your own ES server and enter the details. You could use the OpenML ES, but ideally not.

Basically, my config file was not the problem. I ask for that because I couldn't find my error and I thought I configured wrong the config file.

So will post here my error just in case anyone it's facing the same error:

The folder system has all the files of CodeIgniter. I recommend upgrade instead of modify, but it worked for me modifying a line.

I just know the basics about PHP, but the problem was an assignment of an expression in PHP. I just replace this line:

return $_config[0] =& $config;

https://github.com/openml/website/blob/master/system/core/Common.php#L257

for this:

$_config[0] =& $config;
return $_config[0];

I have another question. I have the website running locally, but for example (it happens with another sections) when I click on People The query return all people from the table users and I have one user in my table users, but it returned 794 users. So It is querying in some external database, I just wanna know where its the line or why it's happening this? Just for the record I have some errors if I have not connection to Internet or the database it's not available. (a little bit strange)

Cheers.

Joaquin Vanschoren

unread,

Jan 14, 2016, 4:48:47 PM1/14/16

to OpenML, xavier sumba, Jan van Rijn

Interesting, I'm CC'ing Jan, maybe he can explain why this PHP thing happens.

We definitely need more people working in Java. There is a lot of backend work, where the evaluations are done, meta-data is calculated, etc. And this is all written in Java. Moreover, we need help with the RapidMiner and KNIME integrations, it would be awesome it we could collaborate on this. Would that interest you? It would allow a much wider comparison of algorithms and workflows than currently possible.

The website uses the openml elasticsearch index by default, but you can point it to your own local version as well, see openml_OS > libraries > ElasticSearch.php. I should change that so it loads the ES server from the config file.

Cheers,
Joaquin

--

Joaquin Vanschoren

unread,

Jan 14, 2016, 5:40:58 PM1/14/16

to OpenML, xavier sumba, bernd....@stat.uni-muenchen.de, Jan van Rijn

Somewhere along the line Bernd got dropped from the thread, so I'm CCing him back in. We discussed mostly technical stuff, though :).

Cheers,

Joaquin

On Thu, Jan 14, 2016 at 10:48 PM Joaquin Vanschoren <j.vans...@tue.nl> wrote:

Interesting, I'm CC'ing Jan, maybe he can explain why this PHP thing happens.

We definitely need more people working in Java. There is a lot of backend work, where the evaluations are done, meta-data is calculated, etc. And this is all written in Java. Moreover, we need help with the RapidMiner and KNIME integrations, it would be good it we could collaborate on this. Would that interest you? It would allow a wider comparison of algorithms and workflows than currently possible.

xavier sumba

unread,

Jan 18, 2016, 10:18:44 AM1/18/16

to OpenML, cuent...@gmail.com, janva...@gmail.com, j.vans...@tue.nl

Hi all,

Interesting, I'm CC'ing Jan, maybe he can explain why this PHP thing happens.

We definitely need more people working in Java. There is a lot of backend work, where the evaluations are done, meta-data is calculated, etc. And this is all written in Java. Moreover, we need help with the RapidMiner and KNIME integrations, it would be awesome it we could collaborate on this. Would that interest you? It would allow a much wider comparison of algorithms and workflows than currently possible.

Awesome!! Yeah, It's interesting for me. That comparison is something that I'm trying to accomplish. So How we start doing things.

The website uses the openml elasticsearch index by default, but you can point it to your own local version as well, see openml_OS > libraries > ElasticSearch.php. I should change that so it loads the ES server from the config file.

Thanks, I will try that!!

Cheers,

XS.

Joaquin Vanschoren

unread,

Jan 18, 2016, 10:41:28 AM1/18/16

to xavier sumba, OpenML, janva...@gmail.com

Hi Jan,

Xavier is interested in working on the KNIME and RapidMiner integrations.

Could you give a status update?

What is still missing from the RM integration? Could you give a quickstart tutorial to play around with the current integration? Or does that not make sense before it is in the Marketplace? What is blocking us?

What is the status of the KNIME code? Would this be something Xavier could take over?

@Xavier, we have code that was developed by the RapidMiner and KNIME people a while ago, and Jan later finished a basic RM integration. The code is all on GitHub. That should be a good starting point for you, but it is very poorly documented right now.

Cheers,

Joaquin

--

Jan van Rijn

unread,

Jan 20, 2016, 10:09:35 AM1/20/16

to Joaquin Vanschoren, xavier sumba, OpenML

Hi Xavier,

Today I did a huge update on the RapidMiner plugin. It's on the OpenML website,
but should soon be in the RapidMiner Markerplace. For now, you can manually
download and install it by copying it to your plugin folder of RapidMiner.

It features three operators: a download task operator, an execute task operator and
upload results operator.

The execute task operator is the most interesting, within this operator the magic

happens.

Feel free to try it out and let me know if there are any problems :)

The status of the KNIME plugin is less advanced. I think the code is pretty

outdated and never really worked. If you want to export KNIME experiments

to OpenML, it would probably be best to start from scratch. The Java Apiconnector

can help a great deal, as it automatically takes care of all API calls and does

most of the work. I myself have no experience in developing for KNIME, but

if you want to develop this plugin I can help with some parts.

Cheers,

Jan

xavier sumba

unread,

Jan 20, 2016, 12:15:22 PM1/20/16

to OpenML, j.vans...@tue.nl, cuent...@gmail.com

Hi Jan,

Today I did a huge update on the RapidMiner plugin. It's on the OpenML website,
but should soon be in the RapidMiner Markerplace. For now, you can manually
download and install it by copying it to your plugin folder of RapidMiner.

It features three operators: a download task operator, an execute task operator and
upload results operator.
The execute task operator is the most interesting, within this operator the magic
happens.
Feel free to try it out and let me know if there are any problems :)

Awesome, I am going to try.

The status of the KNIME plugin is less advanced. I think the code is pretty
outdated and never really worked. If you want to export KNIME experiments
to OpenML, it would probably be best to start from scratch. The Java Apiconnector
can help a great deal, as it automatically takes care of all API calls and does
most of the work. I myself have no experience in developing for KNIME, but
if you want to develop this plugin I can help with some parts.

I was checking the WEKA plugin. How could I test I am using idk 1.7 and I think its running with idk 1.6. For KNIME have I to start with this code [1]? or start everything from scratch?

[1] https://github.com/openml/knime.git

Cheers,

XS.

Jan van Rijn

unread,

Jan 21, 2016, 5:22:34 AM1/21/16

to xavier sumba, OpenML, Joaquin Vanschoren

Concerning KNIME:

Personally, I would start from scratch. Development was started in a very early

stage of OpenML, and many things have changed since.

By making use of the OpenML Java Connector, development should go pretty

smoothly.

You can probably even reuse much of the logic as done in RapidMiner:
https://github.com/openml/rapidminer/tree/master/OpenmlConnector/src/main/java/org/openml/rapidminer

Let me know if I can help!

Cheers,

Jan

--

Reply all

Reply to author

Forward