Designing a Fluidinfo backend for the Annotator project

Kartik Subbarao

unread,

Jul 26, 2012, 4:24:10 PM7/26/12

to FluidDB Discuss

I'm looking into creating a fluidinfo backend-store for the annotator
project, and wanted to get any design recommendations that the
fluidinfo team might have.

http://okfnlabs.org/annotator/

Annotator is an open-source JavaScript library and tool that can be
added to any webpage to make it annotatable. You can highlight any
text on a webpage and store and add comments and tags.

I was thinking that fluidinfo might fit well as a backend database to
store the annotations for webpages, which could also open the door to
referencing other fluidinfo tags attached to the same webpage object.

The annotation format that annotator uses is JSON, as follows:

https://github.com/okfn/annotator/wiki/Annotation-format

The code needs to provide the following interface to the backend
store:

https://github.com/okfn/annotator/wiki/Storage

I've read up on the fluidinfo data model and played around with the
fluidinfo explorer and some of the blog examples, but I'm still fairly
new to the environment. My general sense is that I want to maximize
the use of the extensive tagging/namespace capabilities, and minimize
the amount of content that I have to manage as opaque text strings.

Would it be possible to request an application account for this
project? (I guess in the meantime I could create an okfn.org/annotator
namespace under my account and add the tags there).

At first glance, I think I need to deal with two types of objects:

1) webpage objects. Here, I'd like to tap into what's already there.
Is there a conventional tag name for the URL of a webpage object (url?
uri?)

2) annotation objects. I'd be creating these objects and adding tags
to them based the JSON format above.

I'm thinking that I may want to add a tag to webpage objects that has
a set of all of the annotation object IDs that reference it.

Does this approach make sense? Are there better options? If any of you
could take a few minutes to look at the annotation format and storage
interface, and recommend any particularly fluidinfo-savvy approaches
to store the annotation-related information, I'd greatly appreciate
it!

Thanks,

-Kartik

Jamu Kakar

unread,

Jul 31, 2012, 12:01:04 PM7/31/12

to fluiddb...@googlegroups.com

Hi Kartik,

Sorry for the slow response, there's a lot going on here, but what
you're doing sounds very exciting and like a good fit for Fluidinfo.

On Thu, Jul 26, 2012 at 1:24 PM, Kartik Subbarao
<kartik....@gmail.com> wrote:
> I've read up on the fluidinfo data model and played around with the
> fluidinfo explorer and some of the blog examples, but I'm still fairly
> new to the environment. My general sense is that I want to maximize
> the use of the extensive tagging/namespace capabilities, and minimize
> the amount of content that I have to manage as opaque text strings.
>
> Would it be possible to request an application account for this
> project? (I guess in the meantime I could create an okfn.org/annotator
> namespace under my account and add the tags there).

You can create a domain user using the new user form:

https://fluidinfo.com/accounts/new/

You'll be asked to make a file accessible via your domain. During the
account confirmation process Fluidinfo will attempt to download this
file to verify that you control the domain. If you have any issues,
please let me know. That said, using a namespace for testing should
be workable.

> At first glance, I think I need to deal with two types of objects:
>
> 1) webpage objects. Here, I'd like to tap into what's already there.
> Is there a conventional tag name for the URL of a webpage object (url?
> uri?)

We use the URL in our own applications. In the case of browsers, we
use the 'document.location' value (without any sanitization) as the
about value for the object. I guess you know this, but in FluidDB all
objects have a UUID that uniquely represents them. They also
(optionally) have a fluiddb/about value, which can only be set once
when the object is created. It's a string that uniquely identifies
the object. So we say that the object for a webpage is the one where
fluiddb/about is the URL of that page.

A subtle detail about this approach is that the about values
'http://fluidinfo.com' and 'http://fluidinfo.com/' are different
objects.

> 2) annotation objects. I'd be creating these objects and adding tags
> to them based the JSON format above.
>
> I'm thinking that I may want to add a tag to webpage objects that has
> a set of all of the annotation object IDs that reference it.

You can do that but you'll quickly run into race conditions and
possibly lose data (last writer wins in these kind of updates). It's
better to, on the annotation object, use a tag to refer to the web
page object. For example, you might have an
'okfn.org/annotator/related-url' tag with the about value of the web
page object the annotation is about.

You can then fetch all the annotations with a query like:

okfn.org/annotator/related-url = "http://fluidinfo.com"

And you'll get all the annotations for the 'http://fluidinfo.com'
object. Note, this convention is described here:

http://fluidinfo.com/cookbook/#related

> Does this approach make sense? Are there better options? If any of you
> could take a few minutes to look at the annotation format and storage
> interface, and recommend any particularly fluidinfo-savvy approaches
> to store the annotation-related information, I'd greatly appreciate
> it!

It looks like converting the JSON format for annotations into FluidDB
tags is going to be pretty straightforward. You can use namespaces
and tags to achieve basically the same layout and you can write these
with a single request by using the /values API:

http://api.fluidinfo.com/html/api.html#values_PUT

I suspect you'll be relying on /values a fair amount, since you can
perform batch operations with it. We've found that network latency
dominates the time it takes to perform most requests, so trying to
keep network calls to a minimum is encouraged.

Also, another issue is authentication. Are you going to store all
user data using an okfn.org tag? If that's the case, will calls from
a user's browser go through a proxy of some kind (ie, to hide that
user's credentials)?

So far we've been talking about storing comments as tag values (along
with the other metadata that goes with the comment text), which is
generally a good approach. We've recently been working on an
application called loveme.do which aggregates comments from social
networks about hashtags and URLs and presents them in a nice uniform
way. For example,

http://loveme.do/about/%23bigdata

Probably most of the comments you see there will be from Twitter, but
they can also come from Disqus, Tumblr and Facebook (and in the
future, other services). It could be interesting for Annotator to be
one of the services that's pushing comments into Fluidinfo. In that
case, we create an object for each comment and we use a custom (so far
internal) API for creating them (which does some special analysis and
linking of the content). If this is interesting we could talk more
about how that API works and you could add your additional metadata to
the comment objects created by that API. A way to think about this is
that objects have two kinds of data: structured tag/value and a
comment stream.

Anyway, I hope this is useful and at least gives you the ability to
ask more questions to get closer to what you need to know. Please ask
questions, also, if you hop into #fluidinfo on Freenode we can talk
directly.

Thanks,
J.

Kartik Subbarao

unread,

Aug 1, 2012, 11:35:37 AM8/1/12

to fluiddb...@googlegroups.com, Jamu Kakar

On 07/31/2012 12:01 PM, Jamu Kakar wrote:
> Hi Kartik,
>
> Sorry for the slow response, there's a lot going on here, but what
> you're doing sounds very exciting and like a good fit for Fluidinfo.

[...]

Thanks for the detailed response Jamu! This is exactly the kind of
information/suggestions/validation that I was looking for. Based on
this, I think I have enough for the time being for the first-pass
implementation. As I work on this, I might circle back with some more
questions.

The comment aggregation info sounds interesting. I'll pass that on to
the annotator team and see what the response is like.

Thanks again,

-Kartik

Jamu Kakar

unread,

Aug 1, 2012, 12:57:22 PM8/1/12

to Kartik Subbarao, fluiddb...@googlegroups.com

Hi Kartik,

On Wed, Aug 1, 2012 at 8:35 AM, Kartik Subbarao
<kartik....@gmail.com> wrote:
> On 07/31/2012 12:01 PM, Jamu Kakar wrote:
>> Sorry for the slow response, there's a lot going on here, but what
>> you're doing sounds very exciting and like a good fit for Fluidinfo.
>
> [...]
>
> Thanks for the detailed response Jamu! This is exactly the kind of
> information/suggestions/validation that I was looking for. Based on this, I
> think I have enough for the time being for the first-pass implementation. As
> I work on this, I might circle back with some more questions.

Cool, that sounds great.

> The comment aggregation info sounds interesting. I'll pass that on to the
> annotator team and see what the response is like.

Yeah, the comment aggregation could be interesting. Let me know
when/if you want to know more about it.

Thanks,
J.

Kartik Subbarao

unread,

Aug 29, 2012, 2:09:23 PM8/29/12

to fluiddb...@googlegroups.com

FYI, here's a first pass of the fluidinfo backend for annotator:

https://github.com/kartiksubbarao/annotator-fluidinfo

The next piece I'm looking at is proper end-user authentication:

> Also, another issue is authentication. Are you going to store all user
> data using an okfn.org tag? If that's the case, will calls from a
> user's browser go through a proxy of some kind (ie, to hide that
> user's credentials)?

Annotator uses an oauth-like mechanism for authentication on its backends:

https://github.com/okfn/annotator/wiki/Authentication

The challenge here is that fluidinfo at the moment doesn't appear to
support a delegated authentication mechanism (just username/password).
Are there any oauth-like authentication mechanisms in the works for
fluidinfo?

Thanks,

-Kartik

Reply all

Reply to author

Forward