APML Suggestions

8 views
Skip to first unread message

J. Trent Adams

unread,
Jul 23, 2008, 2:51:48 PM7/23/08
to APML.Public.General

All -

We're rolling out out APML export functionality soon, and it's
relatively painless.

When exploring the import functionality, however, we're running into
significant issues needing some attention before we can consume APML.
In short, they are:

1. Context / Universe of Discourse Identification
2. Concept Key Value Distribution Data
3. Concept Key Confidence Value

Building on the examples by Scott Wilson [1], it seems that the first
issue would be easily solved using RDF to identify the ontologies used
for the Concept Keys.

In a similar vein, it may be possible to solve the second issue using
RDF, too. In this case, though, rather than referencing a
"linguistic" ontology, we could reference a "mathematical" ontology.
This needs to be fleshed out a bit, but the summary is that it'd help
us to understand the distribution and associated equations used to
generate the key values (e.g. is a value of "0.023" high, or low for a
given source distribution).

The third issue, is a bit of a diversion, however. While the
"explicit | implicit" distinction is important, there are is a wide
gradient in meaning to "implicit", and guidance to the consuming party
on it's veracity would help. For example, if the user entered the data
themselves, it could be have a high confidence, whereas if it was
inferred by clickstream data the confidence may be lower. In this way
the consuming party can react to the values appropriately. If we're
able to tackle the second issue, this would be a similar "reference
the mathematical ontology" type solution.

Basically, it turns out that APML as it stands right now is of only
marginal utility for our system to consume. While we're happy
exporting into it, we'd need to see a jump in specificity to safely
consume it.

Any comments or suggestions about how we could move into a more highly
specified (e.g. RDF) model?

- Trent

[1] http://groups.google.com/group/apml-public/msg/98d959e55a05e00e

Paul Jones

unread,
Jul 23, 2008, 10:25:13 PM7/23/08
to apml-...@googlegroups.com
Hi Trent,

We're rolling out out APML export functionality soon, and it's
relatively painless.

Awesome.
 
In a similar vein, it may be possible to solve the second issue using
RDF, too.  In this case, though, rather than referencing a
"linguistic" ontology, we could reference a "mathematical" ontology.
This needs to be fleshed out a bit, but the summary is that it'd help
us to understand the distribution and associated equations used to
generate the key values (e.g. is a value of "0.023" high, or low for a
given source distribution).

The third issue, is a bit of a diversion, however.  While the
"explicit | implicit" distinction is important, there are is a wide
gradient in meaning to "implicit", and guidance to the consuming party
on it's veracity would help. For example, if the user entered the data
themselves, it could be have a high confidence, whereas if it was
inferred by clickstream data the confidence may be lower.  In this way
the consuming party can react to the values appropriately.  If we're
able to tackle the second issue, this would be a similar "reference
the mathematical ontology" type solution.

The question I'd have with these is exactly what you'd be able to specify? The whole thing about numbers is that they come with a built-in ontology.

I've always seen explicit and implicit as being the line between some the user enters versus something that a service automatically generates from other source data. In terms of the level of confidence in implicit data, I'd prefer that the originating application degrade its values rather than specify that it has a "low-confidence" in the values that it is providing - otherwise I could see the interpretation of these numbers being quite messy. I'd also be concerned about what we could potentially place at these endpoint of these ontology urls?

Good to hear you're getting close with APML implementation though, and whatever we can do to progress interoperability is always going to be a good thing!

Cheers,
Paul.

gdupont

unread,
Jul 24, 2008, 7:14:08 AM7/24/08
to APML.Public.General


On Jul 23, 8:51 pm, "J. Trent Adams" <jtrentad...@gmail.com> wrote:
> All -
>
> We're rolling out out APML export functionality soon, and it's
> relatively painless.

Great !


>
> When exploring the import functionality, however, we're running into
> significant issues needing some attention before we can consume APML.
> In short, they are:
>
>   1. Context / Universe of Discourse Identification
>   2. Concept Key Value Distribution Data
>   3. Concept Key Confidence Value
>

I can only agree on that 3 points : IMHO thoses are the limit of APML
diffusion for now. Your proposal on the use of RDF is of course the
most promising solution. I fully understand this for concept (ie key)
interpretation. About the mathematic ontology , I'm not so at ease
because I've never seen such ontology. DO you have any pointer ? Do
you plan to develop one ? What about using work on fuzzy OWL or
probabilistic OWL ? I'm not an expert in that fields, but it sounds
like it may solve this issues.

Finally, about the implicit/explicit distinction and moreover about
the reliability of the expressed interests, I better see an ontology
used to described the sources. As far as I understand the APML source
is more related to describe an application which generates the data,
but we could include some feature based on the type of algorithm used
and the input of this algorithm in order to state on its reliability.
It will enable different systems to interpret the reliability of
sources in different manner.

gd

J. Trent Adams

unread,
Jul 24, 2008, 3:46:42 PM7/24/08
to APML.Public.General

GD and Paul -

Thanks for the encouragement on our APML export. We're looking
forward to putting it out there to see how it's consumed.

Here's a bit more clarity on what I was proposing as additions to the
mix:

1. Context / Universe of Discourse Identification

I believe this is pretty straight-forward, and not terribly
controversial. The basic concept here is that with some prior
knowledge inbound will greatly improve our ability to service users
who provide us with their APML. The most obvious solution is to shift
toward using RDF to support ontology identification.

2. Concept Key Value Distribution Data

I know this is a bit more "out there", but seems that it'd be
incredibly beneficial for consumers of APML. This is basically the
same concept as ontology identification, but pointing to the
"mathematical" universe of discourse rather than the "linguistic"
one. I have no idea if this has been proposed (or is in use)
elsewhere, but from my brief discussions with =Drummond, Kingsley, and
Danny about it seem to indicate it's a relatively novel (and useful)
idea.

As it stands now, we have no way of interpreting the range of Values
across Sources. For example, a single APML file may include a set of
Values in the range [0.0023 - 0.0071] for a given Concept Key Source.
We're assuming that each Source will be relatively consistent in their
application of Values (not a great assumption, but there you go),
however we can't assume the same across Sources. For example, another
Source may have Values in the range of [0.15 - 0.87] within the same
APML file. Further, each range could be representative of different
distributions.

The goal of this proposal is to identify a method by which an APML
consumer can interpret the Concept Key Value. My suggestion is to
take a page out of the RDF playbook, and define something akin to a
"mathematical equation / statistical distribution" ontology. For
example, one set of Values may be from a Poisson distribution, while
another may be Gaussian. Another view may be that the Values should
be considered using more of a sigmoidal curve described by a
cumulative distribution function. Each of which may be offset by some
additional factor to be used for normalization.

In practice, then, perhaps each Concept Key Value field could be
identified (ala RDF) with a pointer to the associated interpretation.
In this way, APML consumers would be able to more effectively assign
meaning to the Value. For example, we would know that a Value of
0.0051 may represent a typical interest from Source A while a typical
user may be represented by 0.87 in Source B.

In my discussion with =Drummond, it's also possible we may be able to
extend this idea to a machine-readable solution (ala XML Schema). It'd
be really neat to get to this level, but the first step is adoption of
the general concept.

3. Concept Key Confidence Value

In the same vein of "more prior knowledge is good", it would be
helpful for APML consumers to gauge the level of confidence a Source
has in the Concept Key Value. Following as a logical extension of the
proposed distribution guidance, the Confidence Value could also be
tagged with it's statistical confidence curve. For example, it's
possible that the confidence in a particular Concept Key Value is
highly dependent upon transient popularity, in which case it's value
would decay over time. In this case, the Value should be interpreted
as an exponential decay function of the Date provided by the Source.

The end goal with the Confidence Value is simply to enable the APML
syntax to allow for it. There are a lot of producers who won't supply
it (or it may be suspect), but in some very contexts it'll be highly
valuable.

I hope this detail helps shed light on what I was proposing. Let me
know, though, if it's still too nebulous and I'll see if I can dive a
bit deeper. In the end, though, if we move to an RDF enabled APML
specification, we should be able to support these cases relatively
easily (without breaking anything, if they're ignored).

Any other questions or suggestions?

- Trent

gdupont

unread,
Jul 25, 2008, 8:45:22 AM7/25/08
to APML.Public.General
Good explanation which confirm most f my interpretation. I must say
that the mathematical ontology to describe values and confidence is in
my sense a very good idea that may have much more impact that only for
APML. Again, I never seen any work in that way, but I'm asking myself
on the link with initiative on probabilistic OWL and so one.

Then some remarks (following your numbering) :

1- no much to say, I think that this is obviously that APM/RDF link
that have to be done at least for term disambiguation

2- I promote the idea, but I'm afraid on the real use : do you think
that web developers (not always in depth with math) will provide such
information ? I know that for some systems, intuitive ideas and
algorithms are great (and give great results), but then hard to match
with a mathematical background (well we could use backup definition
that match most of the case or learn distribution estimation...).
Moreover, the mathematical distribution of results may also evolve in
the same source thus we must add a version number to the distribution
definition.
But of course this will benefit a lot when trying to fuse results from
different sources and try upper level inferences. We have the same
kind of difficulties when trying to fuse results of multiple
information (and concept) extraction modules.

3- Again this is very similar to probabilistic OWL (adding confidence
on a triplet).

So, I'm very interested in all these ideas and will happily follow the
discussion and try to participate by providing link to probabilistic
owl works


gd

On Jul 24, 9:46 pm, "J. Trent Adams" <jtrentad...@gmail.com> wrote:
> GD and Paul

TSchultz55

unread,
Jul 25, 2008, 9:33:18 AM7/25/08
to APML.Public.General
Trent,

Some great thoughts here. Would certainly like to discuss more in
depth as well. What does your availability look like
now with the new DataPortability Governance model?

When I met with Chris in NYC back in June, we discussed APML for a bit
and agreed we should push for the further advancement of the
specification

Now, with your suggestions and invaluable insight based on actual
implementation experiences, I think we need to start seriously
thinking about this. Maybe set up some hard dates, milestones, etc.
for both APML and APML-RDF.

We have a lot of good suggestions regarding APML 1.0 just sitting
there that people took the time to write up, and we should seriously
consider them.

Regarding thoughts #2, and #3....what if we considered an optional
APML-RFDx (extension) to the APML-RDF specification? That way, we can
align development between APML with APML-RDF, at the same time
enabling extra "advanced" functionality to APML-RDF without "breaking"
anything.

Rough TODO List:
Finalize APML 1.0 spec.
Consensus on APML-RDF and finalize
Brainstorm APML-RDFx and begin initial development

Thoughts? I can certainly devote some time to these activities, as
APML is currently being considered for various uses where I work as
well. The ability to query a pool of attention profiles via SPARQL
would be huge.

Who's on board?

Cheers,

Tim

J. Trent Adams

unread,
Jul 25, 2008, 9:51:03 AM7/25/08
to APML.Public.General
Tim -

I like your suggestion about implementing something akin to APML-
RDFx. Sign me up.

In fact, the concept dovetails with GD's comments regarding the
utility of the "distribution ontology" concept. Those who know how to
use it will, while those who don't will avoid it. The spec shouldn't
require it, but not break if it's there. IMO, if it becomes adopted
by the bigger players (i.e. the ones who get the math and the utility
of it being explicitly stated), other APML consumers will gravitate to
using it. The example here is microformats: more sites sprung up
supporting them when Yahoo announced they'd be indexing them.

Anyway, perhaps a kickoff telecon for interested parties would help us
align. If that makes sense, I could leverage the system setup for DP.
On the call we could form a solid milestone schedule along what you
outlined.

- Trent

(PS Regarding the DP governance work, we finally nailed down a
Steering Group, so we can effectively ratify decisions. Big win
there. Now it's a matter of people raising their hands with good
ideas and being able to stand on the foundation we've put in place to
get them done.)

TSchultz55

unread,
Jul 26, 2008, 2:01:17 PM7/26/08
to APML.Public.General
> I like your suggestion about implementing something akin to APML-
> RDFx. Sign me up

Awesome!

> Anyway, perhaps a kickoff telecon for interested parties would help us
> align. If that makes sense, I could leverage the system setup for DP.
> On the call we could form a solid milestone schedule along what you
> outlined.

Sounds great! Let's see if we can get a sound-off from other people
out there who would be interested in getting on this TC and
discussing. Then we can focus on setting a time and agenda for our
kick-off meeting.

The more, the merrier!

If you're interested in discussing APML 1.0, APML-RDF, or APML-RDFx,
please let us know along with your availability - post here, shoot me
an email, @Phocion me on Twitter, send over a carrier pigeon, smoke
signals....whatever.

Looking forward to speaking with you all.

Cheers,

Tim
Reply all
Reply to author
Forward
0 new messages