Read API: Collections and items

61 views
Skip to first unread message

Katie

unread,
May 14, 2014, 8:40:46 PM5/14/14
to zoter...@googlegroups.com
I'm looking for a better way of importing collections and their relationship to items. As far as I'm aware, there is no way to do this in a single request (per every 99 collections). Right now, I'm making a request per single collection for their items. This really slows things down. I've tried different content types but none seem to provide collection information along with their items. It would be great if the list of items was included in, say, the JSON content type.

Perhaps I've missed something in the API. If anyone can help me out, I'd appreciate it!

Dan Stillman

unread,
May 15, 2014, 2:35:12 AM5/15/14
to zoter...@googlegroups.com
Could you describe the specific use case a bit more?

We've tried to model the API around common usage patterns. For
collections, those are 1) clicking on a collection and viewing the items
and 2) for a given item, wanting to know the collections it belongs to.
Both of those are satisfied by the current API: the former with
.../collections/:key:/children and the latter with the 'collections'
property in the v2 [1] item JSON. (Not sure if you're aware of the latter.)

I'm not sure what the use case is to know full library-wide
collection-item membership without also retrieving all items in the
library — and if you do the latter, you have the former.

Dan Stillman

unread,
May 15, 2014, 2:37:45 AM5/15/14
to zoter...@googlegroups.com
On 5/15/14, 2:35 AM, Dan Stillman wrote:
> We've tried to model the API around common usage patterns. For
> collections, those are 1) clicking on a collection and viewing the
> items and 2) for a given item, wanting to know the collections it
> belongs to. Both of those are satisfied by the current API: the former
> with .../collections/:key:/children

Sorry, I meant .../collections/:key:/items.

Katie

unread,
May 15, 2014, 10:21:32 AM5/15/14
to zoter...@googlegroups.com
Thanks for the quick reply!

The overall goal is to recreate the Zotero library in a WP database. The collections import is regularly timing out for some users, and the number of requests made is the major difference between importing items and collections.

Here's a very common case: A user wishes to display all items for a collection. Here it makes sense for collections to have a list of their items (easy, quick db call). But to import, say, even 50 collections to get this relationship information would mean 1 + 50 requests.

If the collections are included as a list for each item, that cuts down the number of requests to Zotero (which is great) but it's not ideal for using that information. For this case, the query would need to search every item in the database and use something like FIND_IN_SET or IN. (Actually, I can't think of a case where having the item-collection relationship stored in the items table is useful ...?).

That said, this would be trickier but doable. If nothing can be done, I will work with the item's JSON. But ideally each collection's JSON would come with a list of their items.

Katie

unread,
May 15, 2014, 10:27:32 AM5/15/14
to zoter...@googlegroups.com
I missed your first case -- I see now how providing a list of collections with each item is useful. But it seems to me that if both cases are valid, the same information should be provided in the same way for each case, e.g. list of collections with each item and list of items with each collection.

Dan Stillman

unread,
May 15, 2014, 2:14:55 PM5/15/14
to zoter...@googlegroups.com
On 5/15/14, 10:21 AM, Katie wrote:
> The overall goal is to recreate the Zotero library in a WP database.
> The collections import is regularly timing out for some users, and the
> number of requests made is the major difference between importing
> items and collections.

It sounds like you're to import an entire Zotero library within the
lifespan of a single request? If that's the case, I'd say that's
probably not a reasonable expectation — for smaller libraries it'd be no
problem, but you don't really have any guarantee going in of how long it
will take. I'm not sure if WP gives you any ability to schedule
background jobs, but that might be something worth looking into.

> Here's a very common case: A user wishes to display all items for a
> collection. Here it makes sense for collections to have a list of
> their items (easy, quick db call). But to import, say, even 50
> collections to get this relationship information would mean 1 + 50
> requests.

Right, but that's where looking at this as an import might not be the
best approach. Would it not be possible to simply load the
collection-item membership from the API on-demand (and cache it) when
the user requests data for that collection?

> If the collections are included as a list for each item, that cuts
> down the number of requests to Zotero (which is great) but it's not
> ideal for using that information. For this case, the query would need
> to search every item in the database and use something like
> FIND_IN_SET or IN.

Well, there's no need to store it as it's served. The next major version
of the Zotero client will use this API for syncing, but it will still
have a collectionItems table with just collectionID and itemID for easy
local querying. So if loading collection-items on-demand isn't possible
for some reason and you do need to treat this as a full-library import,
then you can just pull the collections data out of the item JSON and
store and query it separately.

> That said, this would be trickier but doable. If nothing can be done,
> I will work with the item's JSON. But ideally each collection's JSON
> would come with a list of their items.

There are a couple issues with doing what you suggest:

1) Collection-item membership doesn't affect the modification time of
collections, which means the collection JSON can't be modified on
changes, since the cached data is keyed by the version. And if the mod
time did change, it would result in unnecessary work for sync clients,
who would have to download both the item and the collection on a
membership change.

2) If the collection did include item keys, what would you actually do
with that? You'd still need all the items in the library in order to
display them, and if you have all the items in the library, you already
have the collection-item membership as discussed above. The same
argument can be made in reverse, of course, but the difference is that
most libraries have orders of magnitude more items than collections, so
it makes sense to treat collections as item properties, since getting
the collection data for those keys is usually an extra one or two requests.

Anyhow, I hope that's helpful. On a general note, if you haven't yet you
should take a look at the API syncing page [1], which lays out a basic
syncing method and few possible variations depending on use case. Let me
know if you have any questions on that.

- Dan

[1] https://www.zotero.org/support/dev/server_api/v2/syncing

Katie

unread,
May 15, 2014, 4:36:45 PM5/15/14
to zoter...@googlegroups.com
The plugin used to make requests on demand and cache those results, but this process negatively impacted the user experience on the front-end for visitors, especially when there needs to be many requests. After talking with some users and developers I ultimately I decided to place this "loading" issue on the WP admins (through full-library import functionality) rather than their visitors. There has also been many requests to create WP content types for items. For various reasons this is not yet feasible but it would be great to be able to do a full import of the Zotero library and convert to WP content types in the near future.

WP does have a CRON-like function but it appears to depend on a user accessing a page to trigger it, and the documentation I read warns against using it for large requests (like most Zotero libraries).

(1) I see now the issues with the JSON collection request (I expected there would be a good reason, but I was hoping for some wiggle room).

(2) My understanding is that using FIND_IN_SET etc. and searching ALL items takes more time than simply making a request to a single item. I don't know how much of a lag this would be. But it sounds like I'll have to find out.

I took a look at the syncing page a while ago but it was very much in-progress then. It looks like there's been substantial work done since. Syncing is a highly sought-after feature for the plugin. Ideally you would import once, and it would auto-sync the rest of the time. Perhaps, if I understand correctly, I can make a new table with collection-item relationships that's created when I import items and look at their JSONs.

Dan Stillman

unread,
May 16, 2014, 2:44:59 AM5/16/14
to zoter...@googlegroups.com
On 5/15/14, 4:36 PM, Katie wrote:
> (2) My understanding is that using FIND_IN_SET etc. and searching ALL
> items takes more time than simply making a request to a single item. I
> don't know how much of a lag this would be. But it sounds like I'll
> have to find out.

No, FIND_IN_SET() has no relevance here — comma-separated lists would be
a terrible way to store this or most any data in a database. Just decode
the JSON, get the array of collection keys, and store them in a table
with integer collectionID and itemID columns and a primary and secondary
index across both columns in opposite order. Queries for a given
collection's items, or vice versa, would then be instantaneous.

But this really has nothing to do with the Zotero API. Again, the way
the Zotero API serves data has absolutely no bearing on how you store
the data in the database. (But also, this is JSON, so they're arrays of
integers to begin with, not strings. Even if you somehow had strings,
though, you would still split them into integers before inserting into a
database.)

> I took a look at the syncing page a while ago but it was very much
> in-progress then. It looks like there's been substantial work done
> since. Syncing is a highly sought-after feature for the plugin.
> Ideally you would import once, and it would auto-sync the rest of the
> time. Perhaps, if I understand correctly, I can make a new table with
> collection-item relationships that's created when I import items and
> look at their JSONs.

Yes. (But again, that doesn't have anything to do with the API or the
syncing page. The handful of database fields specific to syncing are
given at the top of that page.)

Katie

unread,
May 16, 2014, 10:32:34 AM5/16/14
to zoter...@googlegroups.com
FIND_IN_SET (+ etc.) was just an example off the top of my head that I hoped I wouldn't have to use. How to store retrieved requests may not be relevant to the API but I'm not sure why that matters. I was looking for an optimal way to retrieve and retain information on the relationship between collections and tags given the API and you've provided one that I'll certainly try. Thank you for that.

Katie

unread,
May 16, 2014, 10:37:57 AM5/16/14
to zoter...@googlegroups.com
Or collections and items. Oops. Anyways, thanks!
Reply all
Reply to author
Forward
0 new messages