Searching for linked items

1 view
Skip to first unread message

Kate Ho

unread,
Feb 17, 2011, 3:35:59 PM2/17/11
to Tech Meetup
Hi Folks,

I have a problem which, if anyone can offer some advice, would be awesome.

I want to be able to search on a generic term and have it return a
list of images that are related to members of that term, rather than
images directly related to that term.

For example: search "Fruit"
Return "apple", "banana", "pineapple", "grapes"

I've thought about different ways of doing this - and have gotten as
far as finding similar items using something like Google Sets or the
Hunch API. But this involves typing in some similar items rather than
just typing in the term that underpins all the items.

If there are any suggestions, they're more than welcomed.

Thanks,

Kate

--
Kate Ho
Managing Director, Interface3
Room 6.09, Appleton Tower, 11 Crichton Street, Edinburgh EH8 9LE

Tel: +44 (0)131 651 3079, Mobile: +44 (0)7877 112 430
http://www.interface3.com

Daniel Winterstein

unread,
Feb 17, 2011, 7:18:06 PM2/17/11
to kate...@gmail.com, Tech Meetup

Hi Kate,

I think this may be one of the few actually technical posts on the
techmeetup list. And it's on NLP :)

WordNet is a good first point of call for structured Qs like this. E.g.
if you're expecting topics like "Fruit", and want clear examples of fruit.

If you're after something a bit more general purpose, e.g where random
internet surfers enter search terms... it gets trickier.

Wikipedia categories cover a lot of ground. If that doesn't cut it, then
you're into statistical NLP.

The first algorithm to try is Latent Semantic Analysis: take the
word/paragraph co-occurence matrix of a large corpus (such as
Wikipedia), do a singular value decomposition to calculate the top few
hundred eigenvectors, and use the resulting projection matrix to find
related words.*

After that it gets interesting ;)

All the best,
- Daniel


*I rarely get an excuse to write that much maths jargon in one place.
Actually that sentence is less scary than it sounds.


On 17/02/11 20:35, Kate Ho wrote:
> Hi Folks,
>
> I have a problem which, if anyone can offer some advice, would be awesome.
>
> I want to be able to search on a generic term and have it return a
> list of images that are related to members of that term, rather than
> images directly related to that term.
>
> For example: search "Fruit"
> Return "apple", "banana", "pineapple", "grapes"
>
> I've thought about different ways of doing this - and have gotten as
> far as finding similar items using something like Google Sets or the
> Hunch API. But this involves typing in some similar items rather than
> just typing in the term that underpins all the items.
>
> If there are any suggestions, they're more than welcomed.
>
> Thanks,
>
> Kate
>
>
>


--
--------------------------------------------

Dr Daniel Winterstein
tel: 0772 5172 612 @winterstein
http://winterwell.com http://sodash.com
Registered in Scotland, company no. SC342991

Paola Di Maio

unread,
Feb 18, 2011, 3:05:02 AM2/18/11
to daniel.wi...@gmail.com, kate...@gmail.com, Tech Meetup
Kate

to suggest an alternative view to Daniels

where I come from the kind of output you describe is achieved via knowledge modelling and representation techniques
(is a system design option) which can be implemented using any choice of code

You create a taxonomy of terms (wordnet is a defaul option, but there are other existing taxonomies that an be used, and are imported
as xml file or other similar notation or you can make your won), some of these terms in the taxonomy are 'categories' (fruit), 

I think an object model could equally well be used to do what you say, 

I d say that your choice of modelling technique would be determined how steep is your hierarchy
of values (fruit is class, apple subclass? or something) plus some other factors


PDM





--
You received this message because you are subscribed to the Google
Groups "Tech Meetup" group.
To post to this group, send email to techm...@googlegroups.com
To unsubscribe from this group, send email to
techmeetup+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/techmeetup?hl=en?hl=en
--

We're on Twitter: http://twitter.com/techmeetup

Ulises

unread,
Feb 18, 2011, 3:50:33 AM2/18/11
to paolad...@gmail.com, daniel.wi...@gmail.com, kate...@gmail.com, Tech Meetup
Yay! Technical discussion.

There are several ways of modelling these relationships you're after.
If you only care about white-label relationships, i.e. you only care
that they *are* related but not the nature of their relationships,
word co-occurrence will get you quite a long way. As Paola mentioned,
there are many different ways of modelling taxonomies, one of which is
to build statistical models which vary in complexity. A really simple
model is the Hyperspace Analogue to Language (HAL) which models each
word as its relationship with other words:
http://en.wikipedia.org/wiki/Semantic_memory#Hyperspace_Analogue_to_Language_.28HAL.29
(actually that entire wikipedia article is interesting in itself).

U

Ben Werdmuller Von Elgg

unread,
Feb 18, 2011, 4:36:10 AM2/18/11
to ulises....@gmail.com, paolad...@gmail.com, daniel.wi...@gmail.com, kate...@gmail.com, Tech Meetup

The trick is, I'd imagine, that Kate doesn't want to actually have to model the world's taxonomies herself. I actually sent her a link to the Flickr clustering API off-list; it's not as snazzy, and it's not as focused, but it seems to be one of the best existing ways to get pictures of items relating to a generic term.

Most of the semantic engines I can find on the web seem to be dedicated to storing information about document -Wikipedia articles, web pages, and so on. Does anyone know if there's a global, taxonomical dictionary out there for people to ping with an API? It might not be the worst idea in the world.

Ben

Sent from a mobile tablet; please forgive typos
http://benwerd.com/

Daniel Winterstein

unread,
Feb 18, 2011, 4:54:54 AM2/18/11
to Kate Ho, Tech Meetup
Hi Kate,
Good discussion you've kicked off :)
Can you tell us what the end use is?
- Daniel

NB: Should these discussion threads migrate off tech-meetup to keep
email traffic down? What's the etiquette?


On 18 February 2011 09:46, Chris Fleming <m...@chrisfleming.org> wrote:


> On 18/02/2011 09:36, Ben Werdmuller Von Elgg wrote:
>>
>> The trick is, I'd imagine, that Kate doesn't want to actually have to
>> model the world's taxonomies herself. I actually sent her a link to the
>> Flickr clustering API off-list; it's not as snazzy, and it's not as focused,
>> but it seems to be one of the best existing ways to get pictures of items
>> relating to a generic term.
>>
>> Most of the semantic engines I can find on the web seem to be dedicated to
>> storing information about document -Wikipedia articles, web pages, and so
>> on. Does anyone know if there's a global, taxonomical dictionary out there
>> for people to ping with an API? It might not be the worst idea in the world.
>

> I think that freebase can provide some of this, for example they have
> cheese, and then a list of types of cheese:
> http://www.freebase.com/schema/food/cheese
>
> But at a glance it doesn't really look like it would help solve the fruit
> example.
>
> Cheers
> Chris


>
>
>> eb 18, 2011, at 8:50 AM, Ulises<ulises....@gmail.com>  wrote:
>>
>>> Yay! Technical discussion.
>>>
>>> There are several ways of modelling these relationships you're after.
>>> If you only care about white-label relationships, i.e. you only care
>>> that they *are* related but not the nature of their relationships,
>>> word co-occurrence will get you quite a long way. As Paola mentioned,
>>> there are many different ways of modelling taxonomies, one of which is
>>> to build statistical models which vary in complexity. A really simple
>>> model is the Hyperspace Analogue to Language (HAL) which models each
>>> word as its relationship with other words:
>>>
>>> http://en.wikipedia.org/wiki/Semantic_memory#Hyperspace_Analogue_to_Language_.28HAL.29
>>> (actually that entire wikipedia article is interesting in itself).
>>>
>>> U
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Tech Meetup" group.
>>> To post to this group, send email to techm...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> techmeetup+...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/techmeetup?hl=en?hl=en
>>> --
>>>
>>> We're on Twitter: http://twitter.com/techmeetup
>
>

> --
> e: m...@chrisfleming.org
> w: www.chrisfleming.org
>
>

--
--------------------------------------------------
Daniel Winterstein
Edinburgh
http://winterwell.com   http://soda.sh

Chris Fleming

unread,
Feb 18, 2011, 4:46:13 AM2/18/11
to b...@benwerd.com, Ben Werdmuller Von Elgg, ulises....@gmail.com, paolad...@gmail.com, daniel.wi...@gmail.com, kate...@gmail.com, Tech Meetup
On 18/02/2011 09:36, Ben Werdmuller Von Elgg wrote:
> The trick is, I'd imagine, that Kate doesn't want to actually have to model the world's taxonomies herself. I actually sent her a link to the Flickr clustering API off-list; it's not as snazzy, and it's not as focused, but it seems to be one of the best existing ways to get pictures of items relating to a generic term.
>
> Most of the semantic engines I can find on the web seem to be dedicated to storing information about document -Wikipedia articles, web pages, and so on. Does anyone know if there's a global, taxonomical dictionary out there for people to ping with an API? It might not be the worst idea in the world.

I think that freebase can provide some of this, for example they have

cheese, and then a list of types of cheese:
http://www.freebase.com/schema/food/cheese

But at a glance it doesn't really look like it would help solve the
fruit example.

Cheers
Chris


> eb 18, 2011, at 8:50 AM, Ulises<ulises....@gmail.com> wrote:
>
>> Yay! Technical discussion.
>>
>> There are several ways of modelling these relationships you're after.
>> If you only care about white-label relationships, i.e. you only care
>> that they *are* related but not the nature of their relationships,
>> word co-occurrence will get you quite a long way. As Paola mentioned,
>> there are many different ways of modelling taxonomies, one of which is
>> to build statistical models which vary in complexity. A really simple
>> model is the Hyperspace Analogue to Language (HAL) which models each
>> word as its relationship with other words:
>> http://en.wikipedia.org/wiki/Semantic_memory#Hyperspace_Analogue_to_Language_.28HAL.29
>> (actually that entire wikipedia article is interesting in itself).
>>
>> U
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Tech Meetup" group.
>> To post to this group, send email to techm...@googlegroups.com
>> To unsubscribe from this group, send email to
>> techmeetup+...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/techmeetup?hl=en?hl=en
>> --
>>
>> We're on Twitter: http://twitter.com/techmeetup

Beatrice Alex

unread,
Feb 18, 2011, 5:20:52 AM2/18/11
to kate...@gmail.com, Tech Meetup
I would go with some of Daniel's suggestion of using Wikipedia lists and categories or Wordnet. That saves you from having to come up with the taxonomy yourself.

I'm not sure if there is an API for Wikipedia, but you can certainly download a snapshot of it (or some of it).

The other option Wordnet, also makes sense to. You can test it out online at:

http://www.wordnet-online.com

E.g. if you search for banana (http://www.wordnet-online.com/banana.shtml) you get 2 senses, one with the semantic relation "is a kind of edible fruit". If you then look at edible fruit: http://www.wordnet-online.com/edible_fruit.shtml. It has 1 sense and all kinds of other types of fruit in the semantic relation "has particulars".

So you could probably exploit this Wordnet hierarchy. However, the hierarchy can be deeper that what you might want. E.g. "orange" is amongst other things a kind of "citrus fruit" (with lots of other particular of kinds of citrus fruit examples) which in turn a kind of of edible fruit, which in turn is a kind of fruit.

Best,

Bea

-----------------------------------------
Dr. Beatrice Alex
Research Fellow and Project Manager at the School of Informatics, University of Edinburgh.

Tel.: +44 131 6502684
http://homepages.inf.ed.ac.uk/balex
http://www.linkedin.com/in/beatricealex
http://twitter.com/bea_alex


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Kate Ho

unread,
Feb 18, 2011, 10:33:34 AM2/18/11
to Tech Meetup
Can I just say, I've had quite a lot of responses to the query and I
*really* appreciate everyone that's chipped in.

The suggestions on/off list have been wordnet, wikipedia
lists/categories, and statistical modelling. Wordnet definitely sounds
like a good possibility, and wikipedia might be a go-er too.

Just following on from what Ben's saying, yes, if there was a global
taxonomical dictionary with an API then that would be perfect. But I
can't find anything like it, so think it might have to be a bit of a
hack job between the suggestions above.

Just to clarify the problem somewhat. The end use is that I'm looking
for a way to gather similar images for a game aimed at young children.
Basically, I'm currently creating collections of stuff - traffic
signs, fruit, types of sports. At the moment, when I google those
terms, I get images of collections of those images rather than
individual ones. One way around this is to google individual terms,
but as a computer scientist I would like to find a way to solve it as
a generic problem rather than just be stuck with forever googling
individual terms (or do this manually next time). I would only need
about 20 items (max) for each category, and the relationships would
just be sub-set of those items.

Kate

--

milk

unread,
Feb 19, 2011, 4:01:18 AM2/19/11
to ba...@staffmail.ed.ac.uk, kate...@gmail.com, Tech Meetup
On 18 February 2011 10:20, Beatrice Alex <ba...@staffmail.ed.ac.uk> wrote:
I'm not sure if there is an API for Wikipedia

DBpedia is worth a look for this. I don't know how much one would have to play around with what you get, but it has links with WordNet also. Aye though, some kind of linked data source could be a good solution.

-milk

--
www.milkmiruku.com

Paola Di Maio

unread,
Feb 20, 2011, 12:11:31 PM2/20/11
to milkm...@gmail.com, ba...@staffmail.ed.ac.uk, kate...@gmail.com, Tech Meetup
Milk and all

linked data hackday in Glasgow next week
all welcome of course!


more info about this group



PDM




-- MiYou received this message because you are subscribed to the Google
Reply all
Reply to author
Forward
0 new messages