RE: learner which can take in both numeric and categorical data ?

Eugene Y.E

unread,

Apr 30, 2012, 11:27:41 AM4/30/12

to milk-...@googlegroups.com

Hi all ,

i've just started on using MILK and am loving it!

I've managed to work through the samples and found it relatively easy to understand.

However, i have a few questions:

1)

I have a dataset whose feature looks like this :

1,4,2,6, apple

1,1,2,1, banana

1,1,2,6, pineapple

And the corresponding labels are:

True

My objective is to classify the incoming as either True or False.

Which learn will be the most suitable for the above dataset ?

2)

My training set are all True Positives ( i.e no False labels ).

That being the case, how do I test my testing dataset such that I can return

a False for data that might be classified as False ?

That's all for now and thanks!

Best.

Mike Axiak

unread,

Apr 30, 2012, 11:50:40 AM4/30/12

to milk-...@googlegroups.com

For categorical data, you'll probably be better off separating into a
dimension for each label, and a "1" if the keyword is equal, and a "0"
otherwise.

For example, for your data set:

1,4,2,6, apple
1,1,2,1, banana
1,1,2,6, pineapple

becomes:

1,4,2,6,1,0,0
1,1,2,1,0,1,0
1,1,2,6,0,0,1

Eugene Y.E

unread,

May 1, 2012, 3:39:51 AM5/1/12

to milk-...@googlegroups.com, mca...@gmail.com

hi, thanks for the reply!

Yup, tried it and it works!

I was wondering do you have any advice on my second question ?

2)

My training set are all True Positives ( i.e no False labels ).

That being the case, how do I test my testing dataset such that I can return

a False for data that might be classified as False ?

Thanks again!

On Monday, April 30, 2012 11:50:40 PM UTC+8, Mike Axiak wrote:

For categorical data, you'll probably be better off separating into a
dimension for each label, and a "1" if the keyword is equal, and a "0"
otherwise.

For example, for your data set:

1,4,2,6, apple
1,1,2,1, banana
1,1,2,6, pineapple

becomes:

1,4,2,6,1,0,0
1,1,2,1,0,1,0
1,1,2,6,0,0,1

Luis Pedro Coelho

unread,

May 1, 2012, 1:43:57 PM5/1/12

to milk-...@googlegroups.com

On Tuesday, May 01, 2012 12:39:51 AM Eugene Y.E wrote:
> 2)
> My training set are all True Positives ( i.e no False labels ).
>
> That being the case, how do I test my testing dataset such that I can return
> a False for data that might be classified as False ?

There are some approaches for that problem, but milk does not implement any of
them.

Sorry
--
Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org

signature.asc

Eugene Y.E

unread,

May 3, 2012, 10:14:58 AM5/3/12

to milk-...@googlegroups.com

Hi, thanks for the reply.

May I know what libraries will you recommend in my case where all training datasets are True Positives ?

Best Regards.

Luis Pedro Coelho

unread,

May 3, 2012, 10:27:00 AM5/3/12

to milk-...@googlegroups.com

On Thursday, May 03, 2012 07:14:58 AM Eugene Y.E wrote:
> Hi, thanks for the reply.
>
> May I know what libraries will you recommend in my case where all training
> datasets are True Positives ?

You can try sckits-learn:

http://scikit-learn.org/stable/modules/outlier_detection.html

HTH
Luis

>
> Best Regards.
>
> On Wednesday, May 2, 2012 1:43:57 AM UTC+8, Luis Pedro Coelho wrote:
> > On Tuesday, May 01, 2012 12:39:51 AM Eugene Y.E wrote:
> > > 2)
> > >
> > > My training set are all True Positives ( i.e no False labels ).
> > >
> > > That being the case, how do I test my testing dataset such that I can
> >
> > return
> >
> > > a False for data that might be classified as False ?
> >
> > There are some approaches for that problem, but milk does not implement
> > any of
> > them.
> >
> > Sorry

--
Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org

LxMLS 2012: Lisbon Machine Learning School
http://lxmls.it.pt

Eugene Y.E

unread,

May 3, 2012, 11:14:13 AM5/3/12

to milk-...@googlegroups.com

Wow thanks for the reply!

Just one more short question:

This may sound weird, but for certain reasons,

my training and testing data are all True Positives.

I could include True Negatives in my training data should I want to.

Assuming I am using both True Negatives and True Positives as my training dataset ( not sure about the proportion )

but only True Positives are used for my testing dataset, will http://scikit-learn.org/stable/modules/outlier_detection.html

be still suitable in this case ?

Best Regards,

Eugene

Luis Pedro Coelho

unread,

May 3, 2012, 11:16:17 AM5/3/12

to milk-...@googlegroups.com

I am not sure I understand what you are trying to do.

Sorry
Luis

Reply all

Reply to author

Forward