RE: learner which can take in both numeric and categorical data ?

17 views
Skip to first unread message

Eugene Y.E

unread,
Apr 30, 2012, 11:27:41 AM4/30/12
to milk-...@googlegroups.com
Hi all ,

i've just started on using MILK and am loving it!

I've managed to work through the samples and found it relatively easy to understand.

However, i have a few questions:
1) 
I have a dataset whose feature looks like this : 
1,4,2,6, apple
1,1,2,1, banana
1,1,2,6, pineapple

And the corresponding labels are:
True
True
True

My objective is to classify the incoming as either True or False.

Which learn will be the most suitable for the above dataset ?

2) 
My training set are all True Positives ( i.e no False labels ).

That being the case, how do I test my testing dataset such that I can return
a False for data that might be classified as False ?

That's all for now and thanks!

Best.

Mike Axiak

unread,
Apr 30, 2012, 11:50:40 AM4/30/12
to milk-...@googlegroups.com
For categorical data, you'll probably be better off separating into a
dimension for each label, and a "1" if the keyword is equal, and a "0"
otherwise.

For example, for your data set:

1,4,2,6, apple
1,1,2,1, banana
1,1,2,6, pineapple

becomes:

1,4,2,6,1,0,0
1,1,2,1,0,1,0
1,1,2,6,0,0,1

Eugene Y.E

unread,
May 1, 2012, 3:39:51 AM5/1/12
to milk-...@googlegroups.com, mca...@gmail.com
hi, thanks for the reply!

Yup, tried it and it works!

I was wondering do you have any advice on my second question ?

2) 
My training set are all True Positives ( i.e no False labels ).

That being the case, how do I test my testing dataset such that I can return
a False for data that might be classified as False ?


Thanks again!

On Monday, April 30, 2012 11:50:40 PM UTC+8, Mike Axiak wrote:
For categorical data, you'll probably be better off separating into a
dimension for each label, and a "1" if the keyword is equal, and a "0"
otherwise.

For example, for your data set:

 1,4,2,6, apple
 1,1,2,1, banana
 1,1,2,6, pineapple

becomes:

 1,4,2,6,1,0,0
 1,1,2,1,0,1,0
 1,1,2,6,0,0,1

Luis Pedro Coelho

unread,
May 1, 2012, 1:43:57 PM5/1/12
to milk-...@googlegroups.com
On Tuesday, May 01, 2012 12:39:51 AM Eugene Y.E wrote:
> 2)
> My training set are all True Positives ( i.e no False labels ).
>
> That being the case, how do I test my testing dataset such that I can return
> a False for data that might be classified as False ?

There are some approaches for that problem, but milk does not implement any of
them.

Sorry
--
Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org
signature.asc

Eugene Y.E

unread,
May 3, 2012, 10:14:58 AM5/3/12
to milk-...@googlegroups.com
Hi, thanks for the reply.

May I know what libraries will you recommend in my case where all training datasets are True Positives ?

Best Regards.

Luis Pedro Coelho

unread,
May 3, 2012, 10:27:00 AM5/3/12
to milk-...@googlegroups.com
On Thursday, May 03, 2012 07:14:58 AM Eugene Y.E wrote:
> Hi, thanks for the reply.
>
> May I know what libraries will you recommend in my case where all training
> datasets are True Positives ?

You can try sckits-learn:


http://scikit-learn.org/stable/modules/outlier_detection.html

HTH
Luis

>
> Best Regards.
>
> On Wednesday, May 2, 2012 1:43:57 AM UTC+8, Luis Pedro Coelho wrote:
> > On Tuesday, May 01, 2012 12:39:51 AM Eugene Y.E wrote:
> > > 2)
> > >
> > > My training set are all True Positives ( i.e no False labels ).
> > >
> > > That being the case, how do I test my testing dataset such that I can
> >
> > return
> >
> > > a False for data that might be classified as False ?
> >
> > There are some approaches for that problem, but milk does not implement
> > any of
> > them.
> >
> > Sorry

--
Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org

LxMLS 2012: Lisbon Machine Learning School
http://lxmls.it.pt

Eugene Y.E

unread,
May 3, 2012, 11:14:13 AM5/3/12
to milk-...@googlegroups.com
Wow thanks for the reply!

Just one more short question:

This may sound weird, but for certain reasons, 
my training and testing data are all True Positives.

I could include True Negatives in my training data should I want to.

Assuming I am using both True Negatives and True Positives as my training dataset ( not sure about the proportion )
but only True Positives are used for my testing dataset, will http://scikit-learn.org/stable/modules/outlier_detection.html  
be still suitable in this case ?

Best Regards,
Eugene

Luis Pedro Coelho

unread,
May 3, 2012, 11:16:17 AM5/3/12
to milk-...@googlegroups.com
I am not sure I understand what you are trying to do.

Sorry
Luis
Reply all
Reply to author
Forward
0 new messages