Re: Updated Shared Records API

Cory Zue

unread,

Oct 15, 2007, 5:58:59 PM10/15/07

to schw...@rii.ricoh.com, Saq Imtiaz, Greg Wolff, Martin Budden, Andreas Kollegger, Jonathan Jackson, shared-...@googlegroups.com

Hi Ed,

Thanks for the feedback. Responses are below. I'm cc'ing the sharedrecords group as well.

On 10/15/07, Ed Schwartz < schw...@rii.ricoh.com> wrote:

1) How does the "Deleted" attribute work?

(The first thing that pops into my mind is the following which makes
an assumption about the use of Title. If it's easy to just ignore this
and explain "Deleted" from the beginning, that's fine. But maybe this
will make one or more wrong assumptions of mine be obvious.

  - At time T=1, a "Ed Stuff" tiddler is created. A metadata entry
    with Title "Ed Stuff" and some initial content is sent to server
    with "Deleted" = False.
  - At time T=2, a "Ed Stuff" tiddler is edited. A metadata entry
    with Title "Ed Stuff" with the new content and is sent to server
    with "Deleted" = False.
  - At time T=3, a "Ed Stuff" tiddler is deleted. A metadata entry
    with Title "Ed Stuff" and no content is sent to server with
    "Deleted" = True

  Requests with "title=Ed Stuff" and "mostRecentByTitle=True" would return:
    - nothing for T<1
    - an entry with the inital content for 1<T<2
    - an entry with the edited content for 2<T<3
    - the "deleted" entry for 3<T
)

This is exactly what I had in mind.

2) Checkpoints sound good, and if they existed I think it would be
straightforward to modify some of my existing software to use them and
that I'd be motivated to do it. If I think of any reason why I'd
prefer parameters or HTTP headers, I'll let you know.

3) I'd like a way to access a particular metadata entry that is
shorter than the Record ID for the Record, Title, Rolling Checksum and
Sequence (90 characters plus title, not including any information
about the server). For example an API that just used the Rolling
Checksum would be nice if

http://sra.sharedrecords.org:8080/log/0123456789abcdef0123456789abcdef01234567.log

It would also be nice if the returned the metadata entry was in a
format that hashed to (in this example)
0123456789abcdef0123456789abcdef01234567 .

This is a bit tricky and also slightly separate from the annotation issue. Let me think about and get back to you. Are you in danger of running of characters in the URI?

4) Will the API support finding out the File Name for the last
metadata entry? Will this return only this and not any entries?

(I haven't found a way to find out what the File Name is for the last
metadata entry for a record with the current Shared Records
implementation. I know how to find the second to last one by
requesting all the metadata entries and looking in the Previous
Metadata Entry for the last one. I haven't actually tried it, but
I've also seen the documentation for how to find out the last metadata
entry on the server after a record is deposited.)

This should be coming back in the record headers during a HEAD or GET of the record itself. See: http://www.sharedrecords.org/trac/sharedrecords/wiki/RestRecordRetrieve

5) With apologies for bring up something that might be really hard or
unsolvable, another interest I have is how to deal with multiple
servers. Has anyone thought about being able to store metadata
entries on more than one server, for example a local server and the
public shared records server? Has anyone thought about moving metadata
entries from one server to another?

Synchronization across servers has long been something we have been planning on doing better than we currently are. In general the Java and C# clients do a reasonable job at this with data synced/pushed to all servers at the time it is accessed (although there are still issues - for example we can't guarantee ordering, and therefore checksums, of metadata across servers). Unfortunately the framework for communicating and syncing among servers has never been fully addressed.

-Cory

--
Cory L. Zue
Chief Technology Officer
Dimagi, Inc | One Kendall Square | Bldg. 400, 4th Floor | Cambridge, MA 02139
work: (617) 621 8595 x19 | cell: (617) 416 0544
http://www.dimagi.com/

Ed Schwartz

unread,

Oct 15, 2007, 8:17:02 PM10/15/07

to shared-...@googlegroups.com, lew...@gmail.com, wo...@rii.ricoh.com, mjbu...@gmail.com, and...@kollegger.name, jjac...@dimagi.com, shared-...@googlegroups.com, schw...@rii.ricoh.com

Cory,

Thanks for your quick response!

> > I haven't found a way to find out what the File Name is for the last
> > metadata entry for a record with the current Shared Records
> > implementation.

> This should be coming back in the record headers during a HEAD or GET of the

> record itself. See:
> http://www.sharedrecords.org/trac/sharedrecords/wiki/RestRecordRetrieve

I'd be happy if the record headers had this. Maybe I'm doing
something wrong or the internal server I use is either old or has some
bug. I normally see Max-Metadata-Sequence-Number but not the full File
Name. Here's an example:

$ curl --verbose --output retrieved.txt http://wort3:8080/SRCDataStore/RESTServlet/4704926776112a8dcfb34f3039dc614d02fdd623.data
* About to connect() to wort3 port 8080
* Trying 192.80.10.223... connected
* Connected to wort3 (192.80.10.223) port 8080
> GET /SRCDataStore/RESTServlet/4704926776112a8dcfb34f3039dc614d02fdd623.data HTTP/1.1
> User-Agent: curl/7.15.5 (i486-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5
> Host: wort3:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< x-amz-meta-record-file-extension: .txt
< Max-Metadata-Sequence-Number: 302
< Content-Type: text/plain
< Content-Length: 50
< Date: Mon, 15 Oct 2007 22:05:22 GMT
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 50 0 50 0 0 416 0 --:--:-- --:--:-- --:--:-- 0* Connection #0 to host wort3 left intact

* Closing connection #0

If I'm not doing something wrong that you can point out, I can try to
replicate this on test.sharedrecords.org and file a Trac Ticket if
you'd like.

> > 3) I'd like a way to access a particular metadata entry that is
> > shorter than the Record ID for the Record, Title, Rolling Checksum and
> > Sequence (90 characters plus title, not including any information
> > about the server). For example an API that just used the Rolling
> > Checksum would be nice if
> >
> >
> > http://sra.sharedrecords.org:8080/log/0123456789abcdef0123456789abcdef01234567.log
> >
> > It would also be nice if the returned the metadata entry was in a
> > format that hashed to (in this example)
> > 0123456789abcdef0123456789abcdef01234567 .

> This is a bit tricky and also slightly separate from the annotation issue.
> Let me think about and get back to you. Are you in danger of running of
> characters in the URI?

I agree that this is separate from anotation. Thanks for considering
it. Shorter identifiers would be more convenient for some research
ideas that are not well formed at the moment. Storing these
identifiers in database fields using limited length strings is one
thing I have in mind.

-- Ed

Cory Zue

unread,

Oct 17, 2007, 4:43:33 PM10/17/07

to Martin Budden, Saq Imtiaz, Greg Wolff, Andreas Kollegger, Jonathan Jackson, Ed Schwartz, shared-...@googlegroups.com

All,

Please find attached the second iteration of the annotation API. Comments/feedback welcome.

best,

Shared Records Annotation API to support filtering (version 2).doc

Martin Budden

unread,

Oct 22, 2007, 12:41:51 PM10/22/07

to Cory Zue, Saq Imtiaz, Greg Wolff, Andreas Kollegger, Jonathan Jackson, Ed Schwartz, shared-...@googlegroups.com

Cory,

your document seems to have captured everything we agree in the conference call.

I disagree with one thing, though, I think the requests should be of the form:

<Server Base Address>/<Record ID>/annotations/<Annotation
Title>?<Parameters and Values>

rather than

<Server Base Address>/annotations/<Record ID>/<Annotation
Title>?<Parameters and Values>

since the annotations are part of the record. This does have the
disadvantage of making the syntax for the additionalRecordID a little
less natural, but I think this is outweighed by the fact that all
other URIs are more natural.

One thing you don't specify is how a checkpoint is set.

Martin

Cory Zue

unread,

Oct 23, 2007, 11:15:11 AM10/23/07

to Martin Budden, Saq Imtiaz, Greg Wolff, Andreas Kollegger, Jonathan Jackson, Ed Schwartz, shared-...@googlegroups.com

Hi Martin,

The reason I had chosen to order the URI the way I did was that on the (likely) case where the annotation and "core" api are being served by the same machine, this provides a nice separation point. Otherwise we would have some metadata requests going to both:

<Server Base Address>/<Record ID>/annotations/

and
<Server Base Address>/<Record ID>/log/

Which is a bit confusing. We could mitigate this by having the separation occur at the base address level, where it would actually be something like

<Server Base Address>/ann/<Record ID>/annotations/
and
<Server Base Address>/core/<Record ID>/log/

"core" would replace what we have now as "SRCDataStore/RESTServlet", which is a pretty clunky name, and an artifact of how the code was originally developed.

Either way, this is a small point, and in the meantime I'll begin working on implementing the functionality server-side.

-Cory

Martin Budden

unread,

Oct 23, 2007, 12:54:29 PM10/23/07

to Cory Zue, Saq Imtiaz, Greg Wolff, Andreas Kollegger, Jonathan Jackson, Ed Schwartz, shared-...@googlegroups.com

Cory,

I agree that it is a small point and won't get in the way of
implementation. I'll think a bit more about the points you have made.

Martin

Reply all

Reply to author

Forward