In the current APML spec, concepts are just names, with entries such
as:
IMPLICIT
[concept key="soccer" value=0.9 source=Particls] ; according to
Particls, I like "soccer" a lot
[concept key="Harry Potter" value=0.7 source=Amazon] ; Amazon
believes I like Harry Potter somewhat
[concept key="George Harrison" value=0.8 source=last.fm] ; Last.Fm
knows I like George Harrison
[concept key="Céline Dion" value=-0.6 source=last.fm] ; and Céline
Dion not so much
[source key="www.lemonde.fr" value = 0.7 source=FeedDemon] ; FeedDemon
observed that I spend a lot of time reading articles from "lemonde.fr"
[author key="Scoble" value=0.6 source=Particls]
[...]
EXPLICIT
[concept key="collective intelligence" value=1.0 source=Particls] ;
I told Particls this is one of my favorite topics
[...]
So, there is a little more info in the APML file than in a tag-cloud:
it stores the origin of the information (in the "source=" field), and
it also makes an interesting distinction between implicit (deduced)
and explicit (user-provided) data.
Anyway, there is an obvious 1-1 correspondence between the core of our
current APML files (made of concept-value pairs) and tag-clouds: they
are equivalent representations. In other words, there is an obvious
translation that can convert the core of an APML file to the
corresponding tag-cloud and conversely.
Now a tag cloud is great to give *you*, a human reader, an overall
idea about what's popular in a community, or about what a user
generally likes. It can also be a pretty fair representation of what
an article talks about. But if you were a computer, it wouldn't mean
much to you: it's just a list of terms. And if you (just think you're
a computer for a minute) were trying to use a tag cloud to target
interesting content to a particular user, while filtering out the
irrelevant stuff, you would feel a bit uneasy...
I believe that tag-clouds and the current AMPL spec (concepts as
names) may be good enough for some real improvement in targeted
advertising, but will be insufficient for good news filtering. The
life of a news filtering engine would be made simpler if the APML file
contained entries such as
[concept key="http://dbpedia.org/resource/George_Harrison" value=0.8
source=last.fm]
rather than
[concept key="George Harrison" value=0.8 source=last.fm]
Some advantages of using URIs instead of names are:
1. to eliminate the ambiguity of reference (it can make clear that the
interest is about George Harrison the singer and not about George
Harrison the senator)
2. to connect the identified resources to other objects (also part of
the same "web of data"). When a resource is identified (not just
named), you can instantly get a lot of info about it, some of which
may be useful to help you solve your problem.
"Concepts as URI" still makes the APML file similar to a tag-cloud,
but now each tag in the cloud is connected to lots of other things.
In the wiki, Paul says : "Personally, I believe that if URIs are
introduced, we will need to continue to keep simple entries available
in the APML, lest introducing additional complexity in actual
interpreting the document."
I don't think the complexity will be in interpreting APML files
containing URI's: if an APML-importing service is happy to work with
just the name, it is trivial to get it from the URI (to go from
http://dbpedia.org/resource/George_W._Bush to the name "George W.
Bush", just look at the HTML Title tag).
In other words, the fact that URI's gives your APML-importing program
access to a very complex world of connected sentences doesn't imply
that your software has to be very complex. Navigating in the world of
data (and using the info in it) is likely to be somewhat complex. But
no program will be forced to do this. If you like to keep it simple,
no problem: just ignore this data, and don't attempt to make sense of
it.
I agree however that finding the right URI to put in the APML file
instead of just a keyword will sometimes be non-trivial.
Paul Jones in the wiki asked "how to best semantically enable entities
represented in the APML.": does APML need a richer representation of
concepts than the one we have now?
In the current APML spec, concepts are just names, with entries such
as:
IMPLICIT
[concept key="soccer" value= 0.9 source=Particls] ; according to
Particls, I like "soccer" a lot
[concept key="Harry Potter" value=0.7 source=Amazon] ; Amazon
believes I like Harry Potter somewhat
[concept key="George Harrison" value= 0.8 source=last.fm] ; Last.Fm
[concept key="George Harrison" value= 0.8 source=last.fm]
Chris Saad
FaradayMedia.com - For Audiences of One
Particls.com - Are You Paying Attention?
Engagd.com - The Open Attention Platform
Media2.0Workgroup.org - Social, Democratic, Distributed
APML.org - The OPML of Attention
But after all you're probably right, simple is beautiful. And I
suppose it can in some cases be better to leave some vagueness in the
APML description of user interests, and let the APML-importing
services do the guessing. I was impressed by the Flickr tag clusters
(e.g. http://flickr.com/photos/tags/turkey/clusters ) mentioned by
Brian.
I also found interesting David's remark that "we'd like to be able to
use large APML datasets to produce statistical
trends etc much like Google does with query logs".
On the other hand, it's likely that somebody will soon come up with a
web service that guesses the URI of a particular concept in a tag-
cloud... That could be pretty nice ;-)
On 1 nov, 23:04, "Chris Saad" <ch...@faradaymedia.com> wrote:
> As someone else mentioned - I tend to think of concepts as datapoints taken
> as a whole rather than each one representing self-contained meaning. In
> other words, if the user has 'cats' and 'Siamese' then they are probably
> more interested in actual cats, specifically Siamese cats, than the musical
> cats.
>
> Again, while allowing for hard liking to URIs, in the end enforcing such a
> link will make things very difficult for a lot of non-specialized developers
> to understand and implement and will break the 'really simple attention'
> philosophy.
>
> So in summary, my feeling is that URIs for concepts should be supported, but
> not required.
>
> Chris
>
> On 11/2/07, Francois Dongier <francois.dong...@gmail.com> wrote:
>
>
>
>
>
> > Paul Jones in the wiki asked "how to best semantically enable entities
> > represented in the APML.": does APML need a richer representation of
> > concepts than the one we have now?
>
> > In the current APML spec, concepts are just names, with entries such
> > as:
>
> > IMPLICIT
> > [concept key="soccer" value=0.9 source=Particls] ; according to
> > Particls, I like "soccer" a lot
> > [concept key="Harry Potter" value=0.7 source=Amazon] ; Amazon
> > believes I like Harry Potter somewhat
> > [concept key="George Harrison" value=0.8 source=last.fm] ; Last.Fm
> > [concept key="George Harrison" value=0.8 source=last.fm]
>
> > Some advantages of using URIs instead of names are:
> > 1. to eliminate the ambiguity of reference (it can make clear that the
> > interest is about George Harrison the singer and not about George
> > Harrison the senator)
> > 2. to connect the identified resources to other objects (also part of
> > the same "web of data"). When a resource is identified (not just
> > named), you can instantly get a lot of info about it, some of which
> > may be useful to help you solve your problem.
>
> > "Concepts as URI" still makes the APML file similar to a tag-cloud,
> > but now each tag in the cloud is connected to lots of other things.
>
> > In the wiki, Paul says : "Personally, I believe that if URIs are
> > introduced, we will need to continue to keep simple entries available
> > in the APML, lest introducing additional complexity in actual
> > interpreting the document."
>
> > I don't think the complexity will be in interpreting APML files
> > containing URI's: if an APML-importing service is happy to work with
> > just the name, it is trivial to get it from the URI (to go from
> >http://dbpedia.org/resource/George_W._Bushto the name "George W.
> > Bush", just look at the HTML Title tag).
> > In other words, the fact that URI's gives your APML-importing program
> > access to a very complex world of connected sentences doesn't imply
> > that your software has to be very complex. Navigating in the world of
> > data (and using the info in it) is likely to be somewhat complex. But
> > no program will be forced to do this. If you like to keep it simple,
> > no problem: just ignore this data, and don't attempt to make sense of
> > it.
>
> > I agree however that finding the right URI to put in the APML file
> > instead of just a keyword will sometimes be non-trivial.
>
> --
--- i think we keep redirecting the real issue. We don´t know how to
disambiguate, so we point at a URI, now we need to define what to
expect (or types of things to expect) at that URI. Then that is
another spec to describe concepts, which might use URIs again. See how
you end-up following your nose through several layers of formats
before potentially getting a simple answer, and even then that answer
will be in english prose because you can only describe things in
relation to each other.
> But, as GD has proposed (and correct me if I'm wrong, but I think
> we're on the same wavelength here) if the APML agent sees:
>
> <Concept key="wine" uri="http://www.w3.org/TR/2003/PR-owl-
> guide-20031215/wine#Wine" value="0.87" from="GatheringTool.com"
> updated="2007-07-13T09:22:00Z" />
> <Concept key="WINE" value="0.27" from="GatheringTool.com"
> updated="2007-02-11T04:21:00Z" />
>
> The agent can in fact be sure that the first concept is indeed wine,
> the drink.
>
> From here, if the APML agent is integrated with something like OpenRDF
> Sesame, the agent would be able to traverse the specified semantics
> document and make even more assertions for Implicit concept entries.
--- but how can it be sure? we would need another structured format
(you propose OWL) for what is at the end of the URI, then parse it,
then get some sort of machine understanding.
OR have an enumerated list of all possible keys and a predefined URI
internally for each app to dereference. Neither of which are really
desirable. The more layers of complexity the more we lose people.
> So overall, one small optional attribute has the ability to open a
> whole new world of even "smarter" decisions based on semantics.
--- the problem is that how your APML gets built could be de-void of
human intervention. If we look at the last.fm example, it is purely
pulling information from TAGS, so what URI could you use? if i am
listing to a geology podcast and tag it with "rocks" then does last.fm
output the APML with a URI to their definition of "rocks" which might
not be MY definition? you are basically asking each person to go find
a URI that actually describes their interest and then putting that in
their APML file manually - this can't really be done while compiling
keys dynamically into APML.
> I think APML is a perfect vehicle for enabling semantic technologies.
--- i would agree, but i don´t think adding a URI gives much value
because no one will do it. It is VERY hard to do dynamically because
the converting app only knows about itself, not your original
intentions.
I like the way Flickr tries to solve this disambiguation problem, by
clustering. If i have key="rock" then there is no hope for what i
mean. But is i have key="rock", key="sediment" key="geology" then you
can cluster rock with those.
A way to solve this could be a <cluster> element.
<cluster>
<Concept key="wine" value="0.87" from="GatheringTool.com"
updated="2007-07-13T09:22:00Z" />
<Concept key="vineyard" value="0.5" from="GatheringTool.com"
updated="2007-07-13T09:22:00Z" />
</cluster>
<cluster>
<Concept label="WINE" value="0.27" from="GatheringTool.com"
updated="2007-02-11T04:21:00Z" />
<Concept label="linux" value="0.1" from="GatheringTool.com"
updated="2007-02-11T04:21:00Z" />
</cluster>
This would allow the same term to appear and still be put into context
with other terms. I have no idea how this would parse or be
understood, but i think it is much easier to create clusters from
seemingly random data and data sources, without "cross polluting" all
the rock'n'roll geologists´ interests.
The alleviates the need for additional layers to dereference, and
allows for some sloppy semantics. Slang words, and even learning from
the reader's intentions... not all terms will have URIs (that is why
urbandictionary.com is so interesting), but from a cluster of related
terms, some meaning can be extracted. Let the machines do the
hard-work not the publishers.
-brian
--
brian suda
http://suda.co.uk
--- true, but if both the parser and the read know that this was
gleaned from context, then both treat it the same, whereas i give you
a URI (right or wrong) gleaned from context, the APML reader might
assume it is EXACTLY what you mean, when it wasn't. That is the error
between what i intended (clusters) and the explict (the URI). The
clusters are slightly more robust because it allows for a wider range
of tolerances, whereas the URI is pretty strict with a single meaning.
> What about
> something like:
> <Cluster terms="alcohol,beverage,drink">
> <Concept key="wine" ........../>
> <Concept key="merlot" ......./>
> ....
> </Cluster>
i don't think there would need to be a terms attribute, everything in
the cluster is lumped together. So if i sent this to flickr, i want
pictures (to a certain value) of wine, like the drink and pictures of
wine like the OS. It knows this because it can look at the other terms
and match them to internal clusters in it's own DB based on its
millions of users rather than a URI somewhere which might go away or
be 404 or timeout.
> Thoughts? Great stuff all around.
--- clusters certainly have their problems too, how do you get a term
weight across clusters, etc. I just put that out there as an
alternative to URIs. Clusters are in use today on sites like flickr,
even amazon. All mined from user data, just like you attention data is
mined. To me clusters seem more natural, i was talking to my friends
in the US about football and helmets, and i talk to european friends
about football and pitches and back to my american friends about
pitches and baseball. Do i want to find 6 different URIs or just let
the "smart" attention engine figure out that when i am talking certain
words in certain contexts, they should remain in those contexts.
i'm not 100% how to solve this, but an ontology layer seems abit of
overkill to me. Maybe clusters can't solve the problem either, but at
some point we have to accept that there will be collisions and
ambiguities - that serendipity makes life fun.