distributional semantics in cleartk

6 views
Skip to first unread message

Miller, Timothy

unread,
Jan 22, 2015, 5:59:29 PM1/22/15
to cleartk-d...@googlegroups.com
Hello,
I've just pushed a new tree kernel, the Semantic Syntactic Tree Kernel
[1] that uses a lexical function at leaves instead of requiring exact
matches. For testing I just created some dummy small vectors for a few
words. For real usage bigger vectors will be required. I made the
interface use generic java types, so that all vectors would have to be
created from outside by API users. I was just curious about whether
there are any potentially useful types I missed or any plans to
incorporate them, with word2vec and compositional distributional
approaches becoming popular?

[1] http://disi.unitn.it/moschitti/articles/ECIR2007-Bloehdorn.pdf

Tim

Steven Bethard

unread,
Jan 23, 2015, 1:03:19 PM1/23/15
to cleartk-d...@googlegroups.com
On Thu, Jan 22, 2015 at 4:59 PM, Miller, Timothy
<Timothy...@childrens.harvard.edu> wrote:
> I've just pushed a new tree kernel, the Semantic Syntactic Tree Kernel
> [1] that uses a lexical function at leaves instead of requiring exact
> matches.

Cool!

One implementation question: why does
ContinuousCosineLexicalSimilarity use Double[] instead of double[]?

> For testing I just created some dummy small vectors for a few
> words. For real usage bigger vectors will be required. I made the
> interface use generic java types, so that all vectors would have to be
> created from outside by API users.

I'm not sure what you mean by "generic java types" here. I didn't see
any generics in the code.

> I was just curious about whether
> there are any potentially useful types I missed or any plans to
> incorporate them, with word2vec and compositional distributional
> approaches becoming popular?

I don't understand this question, but probably it's the same confusion
as the previous one.

Steve

> [1] http://disi.unitn.it/moschitti/articles/ECIR2007-Bloehdorn.pdf

Miller, Timothy

unread,
Jan 23, 2015, 1:11:43 PM1/23/15
to cleartk-d...@googlegroups.com

On 01/23/2015 01:03 PM, Steven Bethard wrote:
> One implementation question: why does
> ContinuousCosineLexicalSimilarity use Double[] instead of double[]?
Oh I think I was under the impression that a map had to use the Object
form of primitive types but maybe that's mistaken. Or maybe not true if
it's an array.

>> For testing I just created some dummy small vectors for a few
>> words. For real usage bigger vectors will be required. I made the
>> interface use generic java types, so that all vectors would have to be
>> created from outside by API users.
> I'm not sure what you mean by "generic java types" here. I didn't see
> any generics in the code.

Poor word choice on my part. I just meant plain java types, Map, String,
double[], as opposed to creating/using a class with some abstraction
like WordVector or something. Nothing to do with generics.

>> I was just curious about whether
>> there are any potentially useful types I missed or any plans to
>> incorporate them, with word2vec and compositional distributional
>> approaches becoming popular?
> I don't understand this question, but probably it's the same confusion
> as the previous one.

So this was asking whether such abstractions may exist somewhere I
couldn't find, or if not whether they might be worth adding.

Tim

Steven Bethard

unread,
Jan 23, 2015, 1:51:29 PM1/23/15
to cleartk-d...@googlegroups.com
On Fri, Jan 23, 2015 at 12:11 PM, Miller, Timothy
<Timothy...@childrens.harvard.edu> wrote:
> On 01/23/2015 01:03 PM, Steven Bethard wrote:
>> One implementation question: why does
>> ContinuousCosineLexicalSimilarity use Double[] instead of double[]?
> Oh I think I was under the impression that a map had to use the Object
> form of primitive types but maybe that's mistaken. Or maybe not true if
> it's an array.

An array is an object, so a double[] can go into a Map with no trouble.

>>> For testing I just created some dummy small vectors for a few
>>> words. For real usage bigger vectors will be required. I made the
>>> interface use generic java types, so that all vectors would have to be
>>> created from outside by API users.
>> I'm not sure what you mean by "generic java types" here. I didn't see
>> any generics in the code.
> Poor word choice on my part. I just meant plain java types, Map, String,
> double[], as opposed to creating/using a class with some abstraction
> like WordVector or something. Nothing to do with generics.
>
>>> I was just curious about whether
>>> there are any potentially useful types I missed or any plans to
>>> incorporate them, with word2vec and compositional distributional
>>> approaches becoming popular?
>> I don't understand this question, but probably it's the same confusion
>> as the previous one.
> So this was asking whether such abstractions may exist somewhere I
> couldn't find, or if not whether they might be worth adding.

I don't think ClearTK has any such abstraction. But I'd be inclined to
stick with double[] until there's a real demand for something else. A
double[] seems like a pretty good abstraction for a distributional
vector to me. ;-)

Steve

Miller, Timothy

unread,
Jan 23, 2015, 2:40:04 PM1/23/15
to cleartk-d...@googlegroups.com

On 01/23/2015 01:51 PM, Steven Bethard wrote:
> On Fri, Jan 23, 2015 at 12:11 PM, Miller, Timothy
> <Timothy...@childrens.harvard.edu> wrote:
>> On 01/23/2015 01:03 PM, Steven Bethard wrote:
>>> One implementation question: why does
>>> ContinuousCosineLexicalSimilarity use Double[] instead of double[]?
>> Oh I think I was under the impression that a map had to use the Object
>> form of primitive types but maybe that's mistaken. Or maybe not true if
>> it's an array.
> An array is an object, so a double[] can go into a Map with no trouble.
>
>
OK, I've pushed that change.
Reply all
Reply to author
Forward
0 new messages