What does a lightweight repository look like?

5 views
Skip to first unread message

Ben O'Steen

unread,
Aug 5, 2008, 11:07:28 PM8/5/08
to lightweight-repositories
So questions to poke around in this space - i have my own answers to
them, but I'll keep quiet for now ;)

1) Is it important for the repository to actually store all the parts
of the objects it holds, or is it okay for some parts to remain
external?

2) For externally held items, if the authority of persistance is held
by the repository, should an archival copy be made? If so, should the
external or the copied version be given priority when examining the
parts of an object?

3) Mechanism for 'listening' to when changes to an object occurs:
* poll-orientated - "Has it changed? No. Has it changed? etc"
* push-orientated - "queued messages sent via XMPP, ActiveMQ, etc"
* or long poll orientated - "Has it changed? ......{time}...... No."

Or some mix of the above?

4) Is Twitter a (content-specific) repository? Store text, retrieve
it, find simple metadata about the text (time created, author),
subscribe to an author's list of publications?

Does it become a repository if an OAI-PMH shim was added to the public
feed?

5) How can a repository improve on the user-experience, compared to
content-repositories such as google docs, flickr and blogspot.

Ed Summers

unread,
Aug 12, 2008, 9:48:38 AM8/12/08
to lightweight-repositories
Would it be wrong and/or pedantic for me to ask what a repository is
at this point? WordNet says a repository is:

"a facility where things can be deposited for storage or
safekeeping"

Which seems as a good a definition as any. But it begs the question of
what is deposited, and what safekeeping means. I would argue that
Twitter (assuming it's online and responding to requests) is a
repository of 140 character chunks of text attributed to a person. I
don't think it needs to implement oai-pmh to be called a microblog
repository.

I'm not sure whether a repository has to store all the parts of the
objects it holds. If we are to use the WordNet definition of a
repository, and the object being deposited has multiple parts, then to
say the object is being kept safe would imply that all the parts are
internally managed, and monitored by the repository. If the repository
simply managed the references to the parts as URIs and one suddenly
404'd, the repository would have to say "sorry Dave, I know you
deposited that object with me, but I can't give part of it it to you
anymore". Is that safe keeping? If the thing being kept safe is the
inventory of network resources perhaps the answer is yes...if the
thing being kept safe is the inventory, and the inventoried objects
the answer is probably no.

So it seems to me that the answers to your questions seem to be
dependent on the use cases (and requirements) for the "repository" in
question. Is this cheating?

//Ed

Benjamin O'Steen

unread,
Aug 12, 2008, 9:58:12 AM8/12/08
to lightweight-...@googlegroups.com

On Tue, 2008-08-12 at 06:48 -0700, Ed Summers wrote:
> Would it be wrong and/or pedantic for me to ask what a repository is
> at this point? WordNet says a repository is:
>
> "a facility where things can be deposited for storage or
> safekeeping"

I think that for my purposes that is an excellent example for what the word means!

> I would argue that
> Twitter (assuming it's online and responding to requests) is a
> repository of 140 character chunks of text attributed to a person.

I have to strongly agree here! :) Twitter is a content-specific (140
char text blocks) store or repository.

>
> So it seems to me that the answers to your questions seem to be
> dependent on the use cases (and requirements) for the "repository" in
> question. Is this cheating?

Nope - I hope that this group may be a place to pull together use cases
to see how users are really trying to use systems, and therefore to see
what a web-based storage layer should 'look' like, what services it must
offer, etc.

One of the first really important services that sprung out at me was
this notion of messaging - the store should be able to maintain it's own
history in a machine-readable way and even better, be able to push out
events to listeners.

>
> //Ed
> >

Etienne Posthumus

unread,
Aug 12, 2008, 10:21:41 AM8/12/08
to lightweight-repositories

On Aug 12, 3:58 pm, Benjamin O'Steen <bost...@gmail.com> wrote:
> Nope - I hope that this group may be a place to pull together use cases
> to see how users are really trying to use systems, and therefore to see
> what a web-based storage layer should 'look' like, what services it must
> offer, etc.

So here is my contribution wrt to use-cases.
We have (ab)used a SCM system as a repository, storing many small
chunks of XML data.
This has worked very well, especially in combining this with various
hooks to provide notifications or conversions.
The Erasmus University in Rotterdam also runs their entire academic
repository in this manner.

> One of the first really important services that sprung out at me was
> this notion of messaging - the store should be able to maintain it's own
> history in a machine-readable way and even better, be able to push out
> events to listeners.

SCM system excel at this.

One other disconnected thought that I have in relation to this post,
is the role that AtomPub plays in lightweight repositories.
But my thoughts have not quite crystallised on that.

EP

scottw

unread,
Aug 12, 2008, 10:41:19 AM8/12/08
to lightweight-repositories


On Aug 12, 2:58 pm, Benjamin O'Steen <bost...@gmail.com> wrote:
> On Tue, 2008-08-12 at 06:48 -0700, Ed Summers wrote:
> > Would it be wrong and/or pedantic for me to ask what a repository is
> > at this point? WordNet says a repository is:
>
> >   "a facility where things can be deposited for storage or
> > safekeeping"
>
> I think that for my purposes that is an excellent example for what the word means!
>
> > I would argue that
> > Twitter (assuming it's online and responding to requests) is a
> > repository of 140 character chunks of text attributed to a person.
>
> I have to strongly agree here! :) Twitter is a content-specific (140
> char text blocks) store or repository.

So if it outsourced all its storage to S3, would it stop being a
repository and be a... something else?

Benjamin O'Steen

unread,
Aug 12, 2008, 11:42:55 AM8/12/08
to lightweight-...@googlegroups.com

On Tue, 2008-08-12 at 07:41 -0700, scottw wrote:
> So if it outsourced all its storage to S3, would it stop being a
> repository and be a... something else?


That's what I am trying to put my finger on :) What happens if the
separation between the repository and its store of data is the web,
rather than a device driver? It's not a new question, but I think that
it is worth re-examining now that we have some new techniques, services
and libraries of software to try and use.

The Fedorazon project went some way to explore this, but it did
highlight some very big issues with S3 as it is now. So the question can
be, what should S3 look like?

Ben

scottw

unread,
Aug 12, 2008, 12:07:20 PM8/12/08
to lightweight-repositories
Hmm, on the basis that people who use the word "repository" have zero
influence on S3, twitter, or most of the web, perhaps the question
should be more reflective... why do we care, and what do we really
want to achieve? If lightweight repositories are the proposed
solution, then what is the problem they are supposed to address?

S
Reply all
Reply to author
Forward
0 new messages