Container type for Features comprising Strings and Integers

12 views
Skip to first unread message

Arbaz Khan

unread,
Mar 20, 2015, 3:07:03 PM3/20/15
to dis...@factorie.cs.umass.edu
Hi All,

Since, FeatureVectorVariable can contain only a single type, I was wondering what container type should I use for my feature set if it has both strings and integers. 

Or if it can't be done by just one type then is there a way to combine two feature-sets of different types to train a model?

This is in context to a chain-model using Adagrad.

Thanks,
Arbaz

Luke Vilnis

unread,
Mar 20, 2015, 3:08:29 PM3/20/15
to dis...@factorie.cs.umass.edu
What do you mean by integer features? Would it be OK to just add a new string feature template called "INTFEATURE=10" or something like this?

--
--
Factorie Discuss group.
To post, email: dis...@factorie.cs.umass.edu
To unsubscribe, email: discuss+u...@factorie.cs.umass.edu

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@factorie.cs.umass.edu.

Arbaz Khan

unread,
Mar 20, 2015, 3:33:41 PM3/20/15
to dis...@factorie.cs.umass.edu
Yeah I could do that but wouldn't that stop the learner from leveraging the integer values and in-turn lose information. Because it would just interpret values as a string, if there are two feature values INFEATURE=9 and INTFEATURE=10, then it would treat them as two independent values but integers 9 and 10 are not independent, they are close to each other more than say values 1 and 10. 

Hope that helps to clarify my requirement.

Arbaz

Luke Vilnis

unread,
Mar 20, 2015, 3:36:26 PM3/20/15
to dis...@factorie.cs.umass.edu
Oh, I see. I think you are conflating two things here -- the type of the feature (string) and the type of value the feature can take on. Why not just make a feature called "INTFEATURE" and set it equal to 9 or 10? BinaryFeatureVectorVariable requires features to be binary, but regular FeatureVectorVariable should just have a SparseIndexedTensor by default and you should be able to assign whatever (double) value you'd like to it.

--

Luke Vilnis

unread,
Mar 20, 2015, 3:38:34 PM3/20/15
to dis...@factorie.cs.umass.edu
Honestly though, you might be better off bucketing the int values and still using (possibly overlapping) binary features. For example, if you want them to overlap you could make a few features called "INTFEATURE>1", "INTFEATURE>3", etc, and turn them all on depending on what the feature value was. You could also just split it into buckets and have features like "1<INTFEATURE<3", etc, but this would only capture information about neighboring int values when things fell in the same bucket.

On Fri, Mar 20, 2015 at 3:36 PM Luke Vilnis <lvi...@gmail.com> wrote:
Oh, I see. I think you are conflating two things here -- the type of the feature (string) and the type of value the feature can take on. Why not just make a feature called "INTFEATURE" and set it equal to 9 or 10? BinaryFeatureVectorVariable requires features to be binary, but regular FeatureVectorVariable should just have a SparseIndexedTensor by default and you should be able to assign whatever (double) value you'd like to it.

On Fri, Mar 20, 2015 at 3:33 PM Arbaz Khan <arbazk...@gmail.com> wrote:
Yeah I could do that but wouldn't that stop the learner from leveraging the integer values and in-turn lose information. Because it would just interpret values as a string, if there are two feature values INFEATURE=9 and INTFEATURE=10, then it would treat them as two independent values but integers 9 and 10 are not independent, they are close to each other more than say values 1 and 10. 

Hope that helps to clarify my requirement.

Arbaz

--
--
Factorie Discuss group.
To post, email: dis...@factorie.cs.umass.edu
To unsubscribe, email: discuss+unsubscribe@factorie.cs.umass.edu

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@factorie.cs.umass.edu.

Arbaz Khan

unread,
Mar 20, 2015, 3:40:47 PM3/20/15
to dis...@factorie.cs.umass.edu
Oh yeah, right. Brilliant!  I was inclined on using BinaryFeatureVectorVariable  and thanks for pointing me out in the correct direction. This should really do the job.
Thanks

Arbaz


To unsubscribe, email: discuss+u...@factorie.cs.umass.edu

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@factorie.cs.umass.edu.

Henry Ware

unread,
Mar 20, 2015, 4:54:18 PM3/20/15
to dis...@factorie.cs.umass.edu
Is there a reason he couldn't just use two factors?  One for the Strings and
 one for the Ints?
Reply all
Reply to author
Forward
0 new messages