DM Ontology for Workflow Optimization through Meta-Learning

Melanie HILARIO

unread,

Mar 18, 2010, 6:37:03 AM3/18/10

to meta-l...@googlegroups.com

The Data Mining Ontology for Workflow Optimization through Meta-Learning (DMOP)
http://www.e-lico.eu/?q=DMOP
might be of interest to members of this group. I look forward to your comments
and suggestions, and hopefully expressions of interest in collaborating in
DMOP's development.

Best regards,
Melanie

--
Melanie Hilario
CUI (Battelle A) - University of Geneva
7 route de Drize, 1227 Carouge, Switzerland
Tel: +41 22 379 0222
http://cui.unige.ch/AI-group

Christophe Giraud-Carrier

unread,

Mar 18, 2010, 9:47:03 PM3/18/10

to Meta-learning

Melanie (and all),

Thanks for pointing out your work on this. I downloaded the ECML 2009
WS paper. This is a very nice and rather ambitious project, with
potential to significantly extend what has been done so for in
metalearning.
The idea of looking at the characteristics of algorithms is not novel
in itself (Alexandros did some preliminary work on that if I remember
well, a while ago), but this is a much broader, much more systematic
and much cleaner framework. It is not completely clear to me yet, how
much of the specific algorithmic characteristics of algorithms will
generalize across algorithms, but I suspect building the DMO will help
us answer that question.
I highly recommend the paper (and the web site of course) to everyone.
As we have recently spent time in looking at algorithm behavior and
its impact on relative performance for algorithm selection, we are
certainly interested in collaborating with you.
Kind regards and thanks again for bringing this up to the group.
Christophe

Jun won Lee

unread,

Mar 19, 2010, 2:19:59 PM3/19/10

to meta-l...@googlegroups.com, Christophe Giraud-Carrier

Hi, Melanie and community.

My name is Jun, who is a BYU graduate student researching on Meta-Learning.
My advisor is Christophe.

I read "Data Mining Ontology for Algorithm Selection and Meta-Mining".
I think it is creative and good paper and see lots of good potentials
in this paper.
I would like to collaborate with you.

There are some questions regarding to this paper.
I wish my questions are helpful to make us think further.

Q1)
On your extended Rice's framework.
I haven't fully understood, how algorithm selection work.

According to the diagram of extended Rice's framework,
Given a new dataset x, it is expressed as its data-characteristics and
algorithm characteristics.
However, according to your new framework in Fig2., algorithm
characteristics are independent of input dataset x.
This means, in a meta-dataset,
algorithm characteristics columns are all identical across datasets.
Would this be helpful for meta-learner?
Meta-learner will never pick up these columns since they are
irrelevant to the target class( the label of best algorithm).

In other words, does this extended framework work on predicting best
algorithm for a given new task that is a motivation task in the
original Rice's framework.

Q2)
I know that your ontology is still in progress.
Your papers show that characteristics of probability-based algorithms.
Do you have characteristics of non-probability-based algorithms also?

Related to this, how to build a meta-dataset (for meta-learner) based
on ontology, assuming ontology is complete and ready.
Do you have some ideas in mind?

Thanks a lot.

> --
> You received this message because you are subscribed to the Google Groups "Meta-learning" group.
> To post to this group, send email to meta-l...@googlegroups.com.
> To unsubscribe from this group, send email to meta-learnin...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/meta-learning?hl=en.
>
>

Ricardo

unread,

Mar 24, 2010, 1:13:44 PM3/24/10

to Meta-learning

Dear Melanie,

I found your project and corresponding paper very interesting.
I have to agree with Christophe on the ambitious nature of the
project; it has
been clear during the past years that meta-learning is not an easy
problem.
I believe the novelty of your approach lies on the characterization of
learning
mechanisms which I agree remains mostly unexplored territory. My
suggestion
would be to try to do such characterization very close together with
the counterpart
data characterization. Both interact strongly and trying to study both
independently
is not a good idea. My view on this area is that we need an abstract
understanding
of learning strategies that goes beyond a listing of current
algorithms. For example,
how does partitioning the feature space using models of complexity C1
(e.g., decision
trees) compares to a partition using a single model of complexity C2
(e.g., SVMs)?
where C2 > C1. Once general strategies are better understood, we can
determine
which existing algorithms belong to each strategy.

This area is very interesting to us; my research lab would be very
happy to collaborate with you.

Ricardo.

Ricardo Prudencio

unread,

Mar 24, 2010, 2:32:49 PM3/24/10

to meta-l...@googlegroups.com

Dear Melanie,

My name is Ricardo Prudêncio, from Brazil. I read the paper and also
found very interesting ideas there. I would like to contribute in this
development if I can.

Best Regards,

Ricardo Prudencio.

Center of Informatics
Federal University of Pernambuco

2010/3/18 Melanie HILARIO <Melanie...@unige.ch>:

Melanie HILARIO

unread,

Mar 27, 2010, 11:34:54 PM3/27/10

to meta-l...@googlegroups.com

Dear Christophe, Jun, Ricardo and Ricardo,

Thanks very much for your encouraging feedback, and my apologies for the delay in replying. Last week was the annual review of the e-LICO project; with that done, we can look forward to another year of earnest work on developing the Data Mining Ontology for Workflow Optimization through meta-learning (DMOP) (http://www.e-lico.eu/?q=DMOP). My apologies too for this collective reply, which finds its justification in the common issue we now face: how do we go about with the collaborative development of a data mining ontology?

At the outset, we had in mind some wiki-like mechanism whereby the ontology could be browsed and modified by the open community using a collaborative ontology editor. However, this vision turned out to be utopian, even on the microcosmic scale of our tiny research group. Spontaneous contributions from enthusiastic young members resulted in an unwieldy, if not outright inconsistent, ontology. We realized that we need stringent quality monitoring of the kind enforced in large-scale ontology development efforts such as the OBO Foundry (http://www.obofoundry.org/). We thus sketched a roadmap comprising 5 steps:
1. set up a web site where users and potential contributors can download DMOP or browse it online
2. gather comments and suggestions on the initial ontology
3. draw up a methodology for soliciting and validating contributions
4. set up a quality committee composed of both data mining specialists and ontology engineers
5. design and implement a web-based mechanism for collecting contributions to or comments/reviews on (parts of) the DMOP

We have finished step 1, and are actively eliciting feedback on the initial ontology (step 2) wrt both the ontological and the data mining aspects. We are in contact with ontology engineers to ensure adherence to ontological best practices in DMOP development, while my previous post in this group was aimed at gathering expert views on how to better adapt the current ontology to the needs of meta-learning.

There is one particular issue that I would like to submit to the collective brainstorming of specialists in this group. As described in our SoKD-09 paper, we go beyond data-characterization based meta-learning approaches and attempt to open the black box of inductive algorithms. Hence the ontology characterizes learning (more precisely, classification) algorithms in terms of
- their underlying assumptions
- the characteristics of the models they build (e.g. model structure or functional form, model parameters)
- the objective function, constraints, and optimization strategy they use to learn the appropriate model parameters
- the geometry of the resulting decision boundaries, etc.

This is where your ideas will be of great help in advancing the DM ontology: can you suggest meta-features, other than those already in DMOP, that might help identify the areas of expertise of a given learning algorithm? Since the effectiveness of meta-learning will depend on the richness of this set of meta-features, we would like to ensure that we are not ignoring essential dimensions of analysis of learning algorithms. Once we have a fairly comprehensive and stable set of algorithm characteristics, we can turn to the task of describing and classifying known algorithms (and algorithm families) based on those characteristics.

I look forward to your feedback on the initial version while our project team sets up the mechanisms for more systematic collaboration.

Jun, I will answer your questions in a separate post.

Best regards,
Melanie

On 24-03-10, Ricardo Prudencio <prudenci...@gmail.com> wrote:

> Dear Melanie,
>
> My name is Ricardo Prudêncio, from Brazil. I read the paper and also
> found very interesting ideas there. I would like to contribute in this
> development if I can.
>
> Best Regards,
>
> Ricardo Prudencio.
>
> Center of Informatics
> Federal University of Pernambuco
>
>
> 2010/3/18 Melanie HILARIO <Melanie...@unige.ch>:
> > The Data Mining Ontology for Workflow Optimization through Meta-Learning (DMOP)

> > http://www.e-lico.eu/?q=DMOP (http://www.e-lico.eu/?q=DMOP)

> > might be of interest to members of this group. I look forward to your comments
> > and suggestions, and hopefully expressions of interest in collaborating in
> > DMOP's development.
> >
> > Best regards,
> > Melanie
> >
> > --
> > Melanie Hilario
> > CUI (Battelle A) - University of Geneva
> > 7 route de Drize, 1227 Carouge, Switzerland
> > Tel: +41 22 379 0222

> > http://cui.unige.ch/AI-group (http://cui.unige.ch/AI-group)

Melanie HILARIO

unread,

Mar 28, 2010, 1:20:07 AM3/28/10

to meta-l...@googlegroups.com

On 19-03-10, Jun won Lee <p3m...@gmail.com> wrote:

> Hi, Melanie and community.
>
> My name is Jun, who is a BYU graduate student researching on Meta-Learning.
> My advisor is Christophe.
>
> I read "Data Mining Ontology for Algorithm Selection and Meta-Mining".
> I think it is creative and good paper and see lots of good potentials
> in this paper.
> I would like to collaborate with you.
>
> There are some questions regarding to this paper.
> I wish my questions are helpful to make us think further.
>
> Q1)
> On your extended Rice's framework.
> I haven't fully understood, how algorithm selection work.

Our proposal to extend the Rice framework does not prescribe a specific algorithm selection method. It only suggests extending the set of meta-learning features to include characteristics of algorithms as well as datasets.

> According to the diagram of extended Rice's framework,
> Given a new dataset x, it is expressed as its data-characteristics and
> algorithm characteristics.
> However, according to your new framework in Fig2., algorithm
> characteristics are independent of input dataset x.
> This means, in a meta-dataset,
> algorithm characteristics columns are all identical across datasets.
> Would this be helpful for meta-learner?
> Meta-learner will never pick up these columns since they are
> irrelevant to the target class( the label of best algorithm).
>
> In other words, does this extended framework work on predicting best
> algorithm for a given new task that is a motivation task in the
> original Rice's framework.
>

In the dataset-characteristics-only approach to meta-learning for algorithm selection, each instance is composed of data features while the target feature typically concerns the candidate algorithms (e.g., error rate of an algorithm, name of best algorithm). A dataset's characteristics are identical across different algorithms, and yet you (or rather the meta-learner) manage to generalize. For instance, you can examine all the datasets on which algorithm A1 performs best and use dataset characteristics to define that class of datasets D1 on which A1 is likely to outperform all other algorithms you tested.

However, you will never be able to generalize over algorithms. If several other algorithms, say A2 ... An, perform as well as A1 on the class of datasets D1, you will be obliged to simply enumerate A1, ..., An as potentially best algorithms for that dataset class. Now, if you add algorithm characteristics, you gain in representational and learning power: you can extract the characteristics that are common to algorithms A1, ..., An and that differentiate them from all other algorithms that performed significantly worse, and come up with a general definition of the class of algorithms that would perform best on the class of datasets D1.

> Q2)
> I know that your ontology is still in progress.
> Your papers show that characteristics of probability-based algorithms.
> Do you have characteristics of non-probability-based algorithms also?
>

Certainly. Browse the DMOP (http://www.e-LICO.eu/?q=DMOPBrowser) and see for example the class SVCAlgorithm (and its different subclasses) and all other algorithms under DiscriminantFunctionAlgorithm. (But you might be better off viewing it in Protege -- I just realized the browser does not display all inherited properties).

> Related to this, how to build a meta-dataset (for meta-learner) based
> on ontology, assuming ontology is complete and ready.
> Do you have some ideas in mind?
>

First, you have to describe your algorithms and datasets using the concepts and properties defined in the ontology. Then for each experiment, you encode the characteristics of the dataset used, the algorithm applied and its parameter settings, the model produced (together with its model parameters), the performance measures, etc. The result is a relational problem of course, so you will either have to use a relational learning method or propositionalize your meta-data.

HTH,
Melanie

> Thanks a lot.
>
>
> On Thu, Mar 18, 2010 at 4:37 AM, Melanie HILARIO
> <Melanie...@unige.ch> wrote:

> > The Data Mining Ontology for Workflow Optimization through Meta-Learning (DMOP)
> > http://www.e-lico.eu/?q=DMOP (http://www.e-lico.eu/?q=DMOP)
> > might be of interest to members of this group. I look forward to your comments
> > and suggestions, and hopefully expressions of interest in collaborating in
> > DMOP's development.
> >
> > Best regards,
> > Melanie
> >
> > --
> > Melanie Hilario
> > CUI (Battelle A) - University of Geneva
> > 7 route de Drize, 1227 Carouge, Switzerland
> > Tel: +41 22 379 0222
> > http://cui.unige.ch/AI-group (http://cui.unige.ch/AI-group)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >

> > --
> > You received this message because you are subscribed to the Google Groups "Meta-learning" group.
> > To post to this group, send email to meta-l...@googlegroups.com.
> > To unsubscribe from this group, send email to meta-learnin...@googlegroups.com.

> > For more options, visit this group at http://groups.google.com/group/meta-learning?hl=en. (http://groups.google.com/group/meta-learning?hl=en.)

> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Meta-learning" group.
> To post to this group, send email to meta-l...@googlegroups.com.
> To unsubscribe from this group, send email to meta-learnin...@googlegroups.com.

> For more options, visit this group at http://groups.google.com/group/meta-learning?hl=en. (http://groups.google.com/group/meta-learning?hl=en.)

Jun won Lee

unread,

Mar 29, 2010, 12:04:32 PM3/29/10

to meta-l...@googlegroups.com

Thanks Melanie.
I will take more time to look at DMOP blog.

Your response has helped me better understand your approach.

But I still haven't fully understood yet how meta-learning works for
"algorithm selection".
Please point out where I am wrong in the following argument.

Let's think of a simple case.
Let we have a new dataset d1 to find its best algorithm.

Q1. How do you predict the best learning algorithm for d1?
To be more specific, what algorithm characteristics are you going to
put in meta-data for d1?
We have no idea of any algorithms for d1.

or

Does your meta-learner map between dataset characteristics and
algorithm characteristics
and ontology tells us what is the best algorithm (in the leaf) in the end?

What I mean is
For given d1, meta-learner returns algorithm characteristics of the
best algorithm for d1.
Then we follow down ontology based on the returned algorithm characteristics.
At the end, the leaf node in the ontoloy tells us what is the best
algorithm for d1.

Q2. Also I haven't seen RBF in your paper yet. Do you have ontology
for RBF, too?
That maybe a big help for the research I carry out now.

Thank you very much

Reply all

Reply to author

Forward