Questions about making a scanningcabinet clone with Camlistore

263 views
Skip to first unread message

Jeremy Schlatter

unread,
Dec 4, 2014, 3:17:01 AM12/4/14
to camli...@googlegroups.com
Hi everyone,

I like Brad's scanningcabinet project and want to have something like it in Camlistore. I'm not sure of the best way to do it, though. Can I get some design feedback?

I see scanningcabinet as being composed of these things:
  • Code that triggers scans using SANE.
    •  I don't intend to port this part. The software that came with my scanner has been way easier for me to use than configuring SANE.
  • Code that uploads scanned jpegs/pdfs/whatever and uploads them to metadata entry app.
    • "Uploading to metadata entry app" will become "store files in Camlistore". That part seems simple enough. What is a good way to make the metadata entry app (below) aware of these files? Judging from the publisher app, my first idea is to make them children of a camliRoot node. Does that seem reasonable?
  • Metadata entry app -- group scanned images into logical documents, add tags and other metadata
    • I expect this will become a standalone app, like the publisher app. My main question here is once the user creates a logical document, how should it be represented in Camlistore? My first guess is to store the scanned pages in an ordered static-set. I can also imagine creating a new PDF with all of the pages. Do you have any thoughts on this?
  • Document viewer -- search through and view documents created by the metadata entry app
    • I would ideally like to view the final documents in the standard Camlistore UI. Is there a good way to make this happen? This could involve just displaying PDFs or it could mean creating a custom view specifically for scanned documents.
Thanks!
Jeremy


Mathieu Lonjaret

unread,
Dec 4, 2014, 9:46:19 AM12/4/14
to camli...@googlegroups.com
Hi,

jsyk, I've already started on the importing of the cabinet's existing
data (https://github.com/mpl/scancabimport), and there's already a go
port of the original app at:
https://bitbucket.org/pborgeest/nometicland
That being said,

On 4 December 2014 at 09:17, Jeremy Schlatter
<jeremy.s...@gmail.com> wrote:
> Hi everyone,
>
> I like Brad's scanningcabinet project and want to have something like it in
> Camlistore. I'm not sure of the best way to do it, though. Can I get some
> design feedback?
>
> I see scanningcabinet as being composed of these things:
>
> Code that triggers scans using SANE.
>
> I don't intend to port this part. The software that came with my scanner
> has been way easier for me to use than configuring SANE.
>
> Code that uploads scanned jpegs/pdfs/whatever and uploads them to metadata
> entry app.
>
> "Uploading to metadata entry app" will become "store files in Camlistore".
> That part seems simple enough. What is a good way to make the metadata entry
> app (below) aware of these files? Judging from the publisher app, my first
> idea is to make them children of a camliRoot node. Does that seem
> reasonable?

Yes. or there could be a specific type on the permanodes, like on the
permanodes created by the importers (picasa, twitter, etc...).

> Metadata entry app -- group scanned images into logical documents, add tags
> and other metadata
>
> I expect this will become a standalone app, like the publisher app. My main
> question here is once the user creates a logical document, how should it be
> represented in Camlistore? My first guess is to store the scanned pages in
> an ordered static-set. I can also imagine creating a new PDF with all of the
> pages. Do you have any thoughts on this?

I think the idea was: each document is a permanode, each media object
(i.e. page) is a permanode too, with some camliContent (the image
file). The document has a camliPath for each of its pages maybe?

> Document viewer -- search through and view documents created by the metadata
> entry app
>
> I would ideally like to view the final documents in the standard Camlistore
> UI. Is there a good way to make this happen? This could involve just
> displaying PDFs or it could mean creating a custom view specifically for
> scanned documents.

I think If you stick to Camlistore's data model, and work with Aaron
and Mario, then yes we should end up with something nicely usable
directly from the UI. :-)

> Thanks!
> Jeremy
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Camlistore" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camlistore+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Mathieu Lonjaret

unread,
Jan 7, 2015, 1:45:05 PM1/7/15
to camli...@googlegroups.com
Hi,

I now have a raw prototype that imports successfully metadata (not all
of it done properly yet) + scanned files from gae to camli
(https://github.com/mpl/scancabimport/tree/9c5de42c6fd4669452f766fb14b9ca9f931e07c5).

Basic schema is:
-each scan (a.k.a. media object) is a permanode, with its metadata
("description", "tags", "filename", etc) set as attributes on the
permanode. The actual image file is uploaded is set as camliContent of
that same permanode.
-each document (group of scans) is a permanode, with the same story as
for scans. The relation with scans (pages) is modelled as:
camliPath:pageNumber = permanode of the scan
with pageNumber starting at 1. (in GAE it's kept as the ordered list
of the key Ids of the scans).

Things show up nicely in the UI, with the container view of a document
showing all the pages. Except of course that for now they're not
ordered by page number, since (afair) the UI gets its search result
ordered by time.

One easy way to cheat the UI would be to force set on each scan
permanode a nodeattr.DateCreated attr, in decreasing order with page
number. But that's gross.

Now to do the ordering properly, I'm thinking
1) We let the UI pick on some heuristics (a nodeattrtype on the
document permanode?) to know it's dealing with a doc with ordered
pages.
2) in which case, it asks for a search ordered by camliPath:pageNumber
(or some other attribute specifically dedicated to the page number).
3) let the server do the work
4) profit in the UI?

I'm not sure yet how difficult that would be. It'd require at least
adding the new sorting order in pkg/search/query.go, and the
corresponding enumeration to the corpus I suppose.

Comments? better suggestion?

Cheers,
Mathieu


On 4 December 2014 at 15:45, Mathieu Lonjaret

Aaron Boodman

unread,
Jan 7, 2015, 2:00:51 PM1/7/15
to camlistore
I think you should use camliMember for each page in a document, not camliPath. Either will work, but I feel like a document is logically more like a set than a map, and so camliMember makes more sense to me. Also as a side effect of that the UI will use the title attribute of each page as the title in the UI. You could default it to something nice like 'Page 5', and the user could change it to something useful like if they prefer.

As for ordering, I think this is just a question of how to represent ordered sets in Camlistore, which we haven't needed before.

I guess there are many ways, but what about adding an optional 'previous' field to camliMember claims. That way when things are reordered, we need to make fewer mutations.

- a

Mathieu Lonjaret

unread,
Jan 7, 2015, 4:25:51 PM1/7/15
to camli...@googlegroups.com
On 7 January 2015 at 20:00, Aaron Boodman <aa...@aaronboodman.com> wrote:
> I think you should use camliMember for each page in a document, not
> camliPath. Either will work, but I feel like a document is logically more
> like a set than a map, and so camliMember makes more sense to me.

ok, I agree on the logic, but I'm not sure a camliMember actually
makes things simpler in that particular case.

> Also as a
> side effect of that the UI will use the title attribute of each page as the
> title in the UI. You could default it to something nice like 'Page 5', and
> the user could change it to something useful like if they prefer.

Good to know, thanks.

> As for ordering, I think this is just a question of how to represent ordered
> sets in Camlistore, which we haven't needed before.
>
> I guess there are many ways, but what about adding an optional 'previous'
> field to camliMember claims. That way when things are reordered, we need to
> make fewer mutations.

Ok, I'll toy with that, thanks. I see now that I was attacking that
problem with a "sorting" solution, while we might indeed want
something that is explicitly ordered from the start.

Aaron Boodman

unread,
Jan 7, 2015, 4:29:31 PM1/7/15
to camlistore
Another disadvantage of putting ordering metadata on the page permanode is that this metadata only makes sense in the context of the document permanode. If the page permanode was somehow referenced from elsewhere, and that elsewhere also wanted an ordering, it couldn't do that.

Mathieu Lonjaret

unread,
Jan 7, 2015, 4:51:29 PM1/7/15
to camli...@googlegroups.com
that's why I would not call it ordering metadata. :-) the attribute
you choose as your ordering (or is it actually sorting this time?)
criterion does not have to be authoritative for all cases and can
totally depend on the context, can it not?

Mathieu Lonjaret

unread,
Jan 8, 2015, 11:03:11 AM1/8/15
to camli...@googlegroups.com
About ordered sets, it looks like I stumbled upon a precedent that I
didn't remember about:
in pkg/schema/nodeattr/nodeattr.go , we have:

// CamliPathOrderColon is the prefix "camliPathOrder:".
// The attribute key should be followed by a uint64. The attribute value
// is an existing value of a camliPath element.
// CamliPathOrder optionally sorts sets already using "camliPath:foo" keys.
// The integers do not need to be contiguous, nor 0- (or 1-) based.
CamliPathOrderColon = "camliPathOrder:"

It looks like the picasa importer is setting it, but nothing else
does, and the UI does not take it into account. So, should we
consolidate that use whenever we want to have ordered sets? Or is the
intention that one can use CamliPathOrderColon to sort a set when _it
happens_ that your set is based on a camliPath relation, BUT that it
should not influence one's decision to use camliPath vs camliMember to
create an ordered set?

Another point:
the initial model allows for a Preview attribute for a document (like
an album cover), which would be the page of the document that we want
to show on the index page as the icon for the whole document I
suppose.

server/camlistored/ui/blob_item_image_content.js says:
// Sets can have the camliContentImage attr to indicate a user-chosen
"cover image" for the entire set. Until we have some rendering for
those, the folder in the generic handler is a better fit than the
single image.

but it does not exactly match the definition we have for camliContent
vs CamliContentImage in pkg/schema/nodeattr/nodeattr.go , i.e. since a
document permanode itself does not have any camliContent (its members
- regardless of whether we use camliPath or camliMember - do), we
could very well use camliContent on the document permanode to indicate
the preview page. I'd vote for using camliContentImage though, and
maybe add the preview/cover image use case as documentation in
nodeattr.go

Mathieu Lonjaret

unread,
Jan 8, 2015, 11:44:08 AM1/8/15
to camli...@googlegroups.com
still on the Preview point, even though it's a field of the Document
class in the original code, I don't see it being set or used anywhere,
so Brad, do you want me to add the possibility to set it in the port
of the scanning cabinet web app? Or should I just drop that attribute
of the Document model?

My original question still stands in general though: camliContent or
camliContentImage?

Aaron Boodman

unread,
Jan 8, 2015, 5:51:19 PM1/8/15
to camlistore
On Thu, Jan 8, 2015 at 8:02 AM, Mathieu Lonjaret <mathieu....@gmail.com> wrote:
About ordered sets, it looks like I stumbled upon a precedent that I
didn't remember about:
in pkg/schema/nodeattr/nodeattr.go , we have:

// CamliPathOrderColon is the prefix "camliPathOrder:".
// The attribute key should be followed by a uint64. The attribute value
// is an existing value of a camliPath element.
// CamliPathOrder optionally sorts sets already using "camliPath:foo" keys.
// The integers do not need to be contiguous, nor 0- (or 1-) based.
CamliPathOrderColon = "camliPathOrder:"

heh, I could have guessed that there'd be something like that hiding somewhere. 

It looks like the picasa importer is setting it, but nothing else
does, and the UI does not take it into account. So, should we
consolidate that use whenever we want to have ordered sets? Or is the
intention that one can use CamliPathOrderColon to sort a set when _it
happens_ that your set is based on a camliPath relation, BUT that it
should not influence one's decision to use camliPath vs camliMember to
create an ordered set?

These are really questions for Brad, but my opinion is that there should be one ordering mechanism and it should apply to both camliPath and camliMember attributes.

I'm also not crazy about the use of integers for ordering this way (because you have to pretend you're writing BASIC and leave space between them for future reorders), but it does work, more or less. 

Another point:
the initial model allows for a Preview attribute for a document (like
an album cover), which would be the page of the document that we want
to show on the index page as the icon for the whole document I
suppose.

server/camlistored/ui/blob_item_image_content.js says:
// Sets can have the camliContentImage attr to indicate a user-chosen
"cover image" for the entire set. Until we have some rendering for
those, the folder in the generic handler is a better fit than the
single image.

but it does not exactly match the definition we have for camliContent
vs CamliContentImage in pkg/schema/nodeattr/nodeattr.go , i.e. since a
document permanode itself does not have any camliContent (its members
- regardless of whether we use camliPath or camliMember - do), we
could very well use camliContent on the document permanode to indicate
the preview page. I'd vote for using camliContentImage though, and
maybe add the preview/cover image use case as documentation in
nodeattr.go

The reason that I do not like using an image for containers right now is that you cannot tell when looking at a search result that an item actually has children rather than just being a single thing.

We used camliContentImage to set the "cover photo" for albums from some of the image importers. As a result, you couldn't tell whether a picture was just a picture or whether it was an album.

For a single document it makes more sense, but I still sort of think it would be better if the presentation was different in this case. Like a simple thing could be to superimpose a small folder in the corner of the image or something. Or make it look like a stack of images, or whatever.

Sorry for being persnickety.

- a

Mathieu Lonjaret

unread,
Jan 8, 2015, 6:04:38 PM1/8/15
to camli...@googlegroups.com
On 8 January 2015 at 23:50, Aaron Boodman <aa...@aaronboodman.com> wrote:

<snip>

> The reason that I do not like using an image for containers right now is
> that you cannot tell when looking at a search result that an item actually
> has children rather than just being a single thing.
>
> We used camliContentImage to set the "cover photo" for albums from some of
> the image importers. As a result, you couldn't tell whether a picture was
> just a picture or whether it was an album.
>
> For a single document it makes more sense, but I still sort of think it
> would be better if the presentation was different in this case. Like a
> simple thing could be to superimpose a small folder in the corner of the
> image or something. Or make it look like a stack of images, or whatever.
>
> Sorry for being persnickety.

No problem, I see your point, and I think it's totally fine to wait
until we come up with a good presentation idea to do the preview thing
for sets.
But do you have any opinion on which one (camliContent vs
camliContentImage - or other?) to pick to store the preview image ? I
think this is something we should clarify, define, and document.

Aaron Boodman

unread,
Jan 8, 2015, 6:35:31 PM1/8/15
to camlistore
On Thu, Jan 8, 2015 at 3:04 PM, Mathieu Lonjaret <mathieu....@gmail.com> wrote:
On 8 January 2015 at 23:50, Aaron Boodman <aa...@aaronboodman.com> wrote:

<snip>

> The reason that I do not like using an image for containers right now is
> that you cannot tell when looking at a search result that an item actually
> has children rather than just being a single thing.
>
> We used camliContentImage to set the "cover photo" for albums from some of
> the image importers. As a result, you couldn't tell whether a picture was
> just a picture or whether it was an album.
>
> For a single document it makes more sense, but I still sort of think it
> would be better if the presentation was different in this case. Like a
> simple thing could be to superimpose a small folder in the corner of the
> image or something. Or make it look like a stack of images, or whatever.
>
> Sorry for being persnickety.

No problem, I see your point, and I think it's totally fine to wait
until we come up with a good presentation idea to do the preview thing
for sets.
But do you have any opinion on which one (camliContent vs
camliContentImage - or other?) to pick to store the preview image ? I
think this is something we should clarify, define, and document.

It should be camliContentImage. 

Mathieu Lonjaret

unread,
Jan 8, 2015, 7:50:44 PM1/8/15
to camli...@googlegroups.com
On 8 January 2015 at 23:50, Aaron Boodman <aa...@aaronboodman.com> wrote:

>> It looks like the picasa importer is setting it, but nothing else
>> does, and the UI does not take it into account. So, should we
>> consolidate that use whenever we want to have ordered sets? Or is the
>> intention that one can use CamliPathOrderColon to sort a set when _it
>> happens_ that your set is based on a camliPath relation, BUT that it
>> should not influence one's decision to use camliPath vs camliMember to
>> create an ordered set?
>
>
> These are really questions for Brad, but my opinion is that there should be
> one ordering mechanism and it should apply to both camliPath and camliMember
> attributes.

So let's say we add a "next" and/or "previous" attributes (we can
bikeshed the names later) to our vocabulary of common permanode
attributes. Those would be set as needed when one wants to construct
an ordered set with camilMember (and camliPath). You mentioned doing
so as an extra argument when setting a camliMember claim. But it seems
more complicated than setting this "previous" attribute as a distinct,
separate, claim, no?

The UI would as usual request children/members of the document. I
don't remember if the current query for that specifies an order, but I
think we would have to make sure that it does not specify any, or
maybe a new one like "naturalOrder" or whatever. So the UI would have
to know to specify that when it's dealing with a (scanning cabinet)
document (or any other time where we want an ordered set).

The search handler gets the query, sees that we want "naturalOrder".
At this point, to make things efficient we'd have to write a new
permanodes enumeration method in the corpus based on this natural
order, like the the ones we have based on modtime or ctime. I'm not
sure yet how difficult that would be. But assuming the docs don't have
many pages for now, we can make it work by using the dumb full
enumeration for permanodes, apply the usual Constraint matching, and
then sort them all. Brad actually left a "sort them" TODO at the point
where this should be done :-)
We'd have to define what we do when the set of results cannot be fully
ordered (because at least one of the permanodes in the results does
not have a "next"/"previous" for some reason). Move it at the end of
the set and log? or error out? Or maybe having this "previous"
attribute is automatically added as part of the constraint when
"naturalOrder" is requested, and hence any permanode which does not
have it is actually excluded from the results.

Then we send the blobs back to the UI, wich should have nothing more
to do than display them in the order they already are.

It's still blurry for me but it looks like it could work. Again, I
need to think more about an optimized enumeration for this order in
the corpus.

Comments?

Aaron Boodman

unread,
Jan 8, 2015, 11:34:51 PM1/8/15
to camlistore
On Thu, Jan 8, 2015 at 4:50 PM, Mathieu Lonjaret <mathieu....@gmail.com> wrote:
On 8 January 2015 at 23:50, Aaron Boodman <aa...@aaronboodman.com> wrote:

>> It looks like the picasa importer is setting it, but nothing else
>> does, and the UI does not take it into account. So, should we
>> consolidate that use whenever we want to have ordered sets? Or is the
>> intention that one can use CamliPathOrderColon to sort a set when _it
>> happens_ that your set is based on a camliPath relation, BUT that it
>> should not influence one's decision to use camliPath vs camliMember to
>> create an ordered set?
>
>
> These are really questions for Brad, but my opinion is that there should be
> one ordering mechanism and it should apply to both camliPath and camliMember
> attributes.

So let's say we add a "next" and/or "previous" attributes (we can
bikeshed the names later) to our vocabulary of common permanode
attributes.

Just to be clear my statement about there being one ordering mechanism meant that if we do use camliPathOrder, I felt like it should apply to both camliMember and camliPath.

(OTOH, if we use next/prev, I feel the same way)
 
Those would be set as needed when one wants to construct
an ordered set with camilMember (and camliPath). You mentioned doing
so as an extra argument when setting a camliMember claim. But it seems
more complicated than setting this "previous" attribute as a distinct,
separate, claim, no?

I was originally thinking:

{
  camliType: 'claim',
  claimType: 'add-attr',
  permanode: 'xyz',
  attr: 'camliMember',  // or camliPath:foo
  value: 'bar',
  camliOrderNext: '<hash or next claim>'
};

This has a couple problems though:

1. You have to build the claims in a particular order so that you have the blobrefs. We could also add camliOrderPrevious to partially alleviate this.

2. Everytime you reorder, you have to duplicate the attr and value data.

Instead you could do this:

{
  camliType: 'claim',
  claimType: 'add-attr',
  permanode 'xyz',
  attr: 'camliOrderNext',
  value: '<prevClaimHash>:<nextClaimHash>',
}

This way you can reorder and you only need to rewrite the ordering info, and only for the minimum nodes.

So for a given permanode you'd have a multivalued camliOrderNext attr that defines the linked list. Alternately you could use single-valued attributes of the form 'camliOrderNext:<prevHash>':'<nextHash>'. I don't know if one of these is better than the other from a performance perspective.

The UI would as usual request children/members of the document. I
don't remember if the current query for that specifies an order, but I
think we would have to make sure that it does not specify any, or
maybe a new one like "naturalOrder" or whatever. So the UI would have
to know to specify that when it's dealing with a (scanning cabinet)
document (or any other time where we want an ordered set).

If the UI does specify a sort, I think the best thing would be to remove that, and let the server pick a sort. It will basically order nodes with a comparison that favors the camliOrderNext attributes, but falls back to whatever date thing it is doing now.

The search handler gets the query, sees that we want "naturalOrder".
At this point, to make things efficient we'd have to write a new
permanodes enumeration method in the corpus based on this natural
order, like the the ones we have based on modtime or ctime. I'm not
sure yet how difficult that would be. But assuming the docs don't have
many pages for now, we can make it work by using the dumb full
enumeration for permanodes, apply the usual Constraint matching, and
then sort them all. Brad actually left a "sort them" TODO at the point
where this should be done :-)
We'd have to define what we do when the set of results cannot be fully
ordered (because at least one of the permanodes in the results does
not have a "next"/"previous" for some reason). Move it at the end of
the set and log? or error out? Or maybe having this "previous"
attribute is automatically added as part of the constraint when
"naturalOrder" is requested, and hence any permanode which does not
have it is actually excluded from the results.

Yeah, I think you'd fall back to date or whatever it does now. 

Then we send the blobs back to the UI, wich should have nothing more
to do than display them in the order they already are.

It's still blurry for me but it looks like it could work. Again, I
need to think more about an optimized enumeration for this order in
the corpus.

Comments?

Brad should weigh in.

Aaron Boodman

unread,
Jan 8, 2015, 11:52:20 PM1/8/15
to camlistore
Sorry, fleshing this out a little:

I think what you want is:

1. Sort by whatever date-based thing it is doing now.
2. Go through the date-ordered set and reorder based on whatever declared ordering claims there are.

I think that this will gracefully handle things like cycles and incomplete ordering information. I could be missing something though.

Mathieu Lonjaret

unread,
Jan 9, 2015, 10:08:51 AM1/9/15
to camli...@googlegroups.com
On 9 January 2015 at 05:34, Aaron Boodman <aa...@aaronboodman.com> wrote:

> I was originally thinking:
>
> {
> camliType: 'claim',
> claimType: 'add-attr',
> permanode: 'xyz',
> attr: 'camliMember', // or camliPath:foo
> value: 'bar',
> camliOrderNext: '<hash or next claim>'
> };
>
> This has a couple problems though:
>
> 1. You have to build the claims in a particular order so that you have the
> blobrefs. We could also add camliOrderPrevious to partially alleviate this.
>
> 2. Everytime you reorder, you have to duplicate the attr and value data.

Yeah, I'm a bit uncomfortable with the idea of setting more than one
property (1) pn1 is a camliMember of pn0 , 2) pn1 has pn2 as a
sibling) in one shot/claim when we can do it in separate steps. I
can't justify that feeling more than it seems like taking a risk of
complicating things though.

> Instead you could do this:
>
> {
> camliType: 'claim',
> claimType: 'add-attr',
> permanode 'xyz',
> attr: 'camliOrderNext',
> value: '<prevClaimHash>:<nextClaimHash>',
> }

You meant "value: '<prevPermanodeHash>:<nextPermanodeHash>'" , didn't you?
I mean, pointing to a claim somewhat works too since the claim you
point to indeed has as its target the previous/next permanode we're
looking for, but it seems weirdly indirect. Or am I missing an
advantage to doing this?

> This way you can reorder and you only need to rewrite the ordering info, and
> only for the minimum nodes.
>
> So for a given permanode you'd have a multivalued camliOrderNext attr that
> defines the linked list. Alternately you could use single-valued attributes
> of the form 'camliOrderNext:<prevHash>':'<nextHash>'. I don't know if one of
> these is better than the other from a performance perspective.
>
>> The UI would as usual request children/members of the document. I
>> don't remember if the current query for that specifies an order, but I
>> think we would have to make sure that it does not specify any, or
>> maybe a new one like "naturalOrder" or whatever. So the UI would have
>> to know to specify that when it's dealing with a (scanning cabinet)
>> document (or any other time where we want an ordered set).
>
> If the UI does specify a sort, I think the best thing would be to remove
> that, and let the server pick a sort. It will basically order nodes with a
> comparison that favors the camliOrderNext attributes, but falls back to
> whatever date thing it is doing now.

sgtm.

>> The search handler gets the query, sees that we want "naturalOrder".
>> At this point, to make things efficient we'd have to write a new
>> permanodes enumeration method in the corpus based on this natural
>> order, like the the ones we have based on modtime or ctime. I'm not
>> sure yet how difficult that would be. But assuming the docs don't have
>> many pages for now, we can make it work by using the dumb full
>> enumeration for permanodes, apply the usual Constraint matching, and
>> then sort them all. Brad actually left a "sort them" TODO at the point
>> where this should be done :-)
>> We'd have to define what we do when the set of results cannot be fully
>> ordered (because at least one of the permanodes in the results does
>> not have a "next"/"previous" for some reason). Move it at the end of
>> the set and log? or error out? Or maybe having this "previous"
>> attribute is automatically added as part of the constraint when
>> "naturalOrder" is requested, and hence any permanode which does not
>> have it is actually excluded from the results.
>
> Yeah, I think you'd fall back to date or whatever it does now.

sgtm.

Mathieu Lonjaret

unread,
Jan 9, 2015, 10:26:17 AM1/9/15
to camli...@googlegroups.com
On 9 January 2015 at 05:51, Aaron Boodman <aa...@aaronboodman.com> wrote:

>>> The search handler gets the query, sees that we want "naturalOrder".
>>> At this point, to make things efficient we'd have to write a new
>>> permanodes enumeration method in the corpus based on this natural
>>> order, like the the ones we have based on modtime or ctime. I'm not
>>> sure yet how difficult that would be. But assuming the docs don't have
>>> many pages for now, we can make it work by using the dumb full
>>> enumeration for permanodes, apply the usual Constraint matching, and
>>> then sort them all. Brad actually left a "sort them" TODO at the point
>>> where this should be done :-)
>>> We'd have to define what we do when the set of results cannot be fully
>>> ordered (because at least one of the permanodes in the results does
>>> not have a "next"/"previous" for some reason). Move it at the end of
>>> the set and log? or error out? Or maybe having this "previous"
>>> attribute is automatically added as part of the constraint when
>>> "naturalOrder" is requested, and hence any permanode which does not
>>> have it is actually excluded from the results.

> Sorry, fleshing this out a little:
>
> I think what you want is:
>
> 1. Sort by whatever date-based thing it is doing now.
> 2. Go through the date-ordered set and reorder based on whatever declared
> ordering claims there are.

Yeah, that's similar to what I was calling the "dumb full
enumeration". The only problem with that is you have to wait until you
have enumerated your full set of matching permanodes before you can
sort them properly. So you possibly have enumerated and sorted way
more permanodes than you're actually going to send back as your first
result (because you probably have a limit on the number of results you
send back for each query, which is working well with the continuation
scheme thing afaiu).
As opposed to the way it is optimized right now for time sorted: that
is, the corpus keeps in memory the permanodes sorted by time, and when
the search handler asks for them we stream them through a channel. The
search handler constraint-filter them on the fly, appends them
directly to the results array (they're already sorted), and just stops
reading the channel as soon as it has enough results (as per the limit
set - or not - by the query).

Aaron Boodman

unread,
Jan 9, 2015, 12:56:31 PM1/9/15
to camlistore
I actually meant to refer to the claims, but you're right that referring to the child permanodes works too and is more direct. 

Aaron Boodman

unread,
Jan 9, 2015, 12:57:54 PM1/9/15
to camlistore
Yup. But this is a more general problem. You can already request arbitrary sorts in the search query.

András Pálinkás

unread,
Jan 9, 2015, 3:50:47 PM1/9/15
to camlistore

About the prevClaimHash and nextClaimHash idea. Why don't use something like this:

camliOrder is by default the unix epoch of the created date. When you reorder items, you just change it to (newPrevItem.camliOrder + newNextItem.camliOrder)/2.
Yes, the camliOrder should be a floating point thing.

Mathieu Lonjaret

unread,
Oct 30, 2015, 9:45:05 AM10/30/15
to camli...@googlegroups.com
since this here is the relevant thread, I forgot to mention that I'm mostly done (waiting for review) with the scanning cabinet.


Reply all
Reply to author
Forward
0 new messages