collections?order=title
(zotero.org uses the API. Generally speaking, if zotero.org can do it,
the API can do it.)
> 2) I have a large library (my library and group libraries combine are
> around 10 000 items). What would be the best way to retrieve only
> items that have been modified since the last succesful sync?
items?order=dateModified
> 3) Is it possible to retrieve a list of modified collections or a list
> of the entire collection tree without having to do a separate API call
> for each collection that has child collections?
collections?order=dateModified
But see
http://groups.google.com/group/zotero-dev/browse_frm/thread/11602f1f8cff5d70
for some current caveats.
> 4) Is there any estimate on when modifying attachments through the
> server api will be possible?
No, sorry. Relatively high on the to-do list, though.
> I am concerned about the number of requests that I need to do to do to
> synchronize ZotPad with the Zotero server. If I have understood the
> server API correctly, I would need to check attachments for
> modifications one item at a time. With my libraries this would mean a
> couple of thousand API requests, which is not feasible.
You should almost never need to make individual requests.
It does sound like you're doing pretty much exactly what I said earlier
today wasn't really the intended interaction model for the API�i.e.,
trying to create a fully offline client that syncs the complete library
with the server. That's reasonable, but it's not really what the API was
initially designed for, and it will take some time before it can support
that kind of use case well.
It also might be worth considering whether that's the ideal form for the
app to take. It may very well be, but there are other options. Is it
necessary to update a collection's contents before the collection is
opened? Is it necessary to download all attachments ahead of time by
default (and keep those on the device permanently)? For that matter, is
it necessary to pull down the entire set of items? You mentioned that a
Zotero database can be copied over via iTunes, but almost no normal
users know how to do that, and I think it would be a huge mistake to
make that the recommended method. Of course, if you're not doing that,
10,000 (top-level) items is a minimum of 100 API requests. But is it
necessary to make all of those, or can you load the beginning of the
list, auto-loading more results on scroll or tap, caching more items as
you browse around and updating them on-demand, with API-based search to
help you find specific items?
And I think some people would absolutely want this to work on their
iPhones as well. (I personally read much more on my iPhone than my
iPad.) Are you going to do a full library sync to an iPhone? Or can you
rely on the iPhone's network connection to pull down data as it's
needed, with selective caching?
- Dan
It returns all items. items/top returns top-level items.
> Is an item marked as modified if an attachment to that item was modified?
No. There's no public support for attachment files in the API right now.
This will happen when attachment upload is supported.
> Is a collection modified when an item inside the collection is
> modified? (I assume this is not the case.)
No.
>> It also might be worth considering whether that's the ideal form for the
>> app to take. It may very well be, but there are other options. Is it
>> necessary to update a collection's contents before the collection is
>> opened? Is it necessary to download all attachments ahead of time by
>> default (and keep those on the device permanently)? For that matter, is
>> it necessary to pull down the entire set of items? You mentioned that a
>> Zotero database can be copied over via iTunes, but almost no normal
>> users know how to do that, and I think it would be a huge mistake to
>> make that the recommended method. Of course, if you're not doing that,
>> 10,000 (top-level) items is a minimum of 100 API requests. But is it
>> necessary to make all of those, or can you load the beginning of the
>> list, auto-loading more results on scroll or tap, caching more items as
>> you browse around and updating them on-demand, with API-based search to
>> help you find specific items?
> I gave some thought on the on-demand retrieving of item data. It would
> eliminate the need for a locally stored database and this simplify the
> app, but it would also eliminate one important use case: The use case
> that I have for this app is to be able to read and annotate the PDFs
> stored in Zotero while I am traveling. Because of this, the app should
> not depend on a constantly available internet connection. If I
> implement the app with a local database that is synced with the
> server, then on demand retrieving of items and collections from the
> server API would be redundant.
I understand that's an important use case, but the options aren't a
complete copy of the database or no local storage at all. There's a
possible middle ground where you're caching data on-demand when a
network connection is available but still functioning offline with
cached data. Even if the app gives you the option of pulling all data
before you go offline, or downloading down all PDFs, that doesn't mean
it has to or should by default, and it would still be a fundamentally
different model, such that a user could mark just certain collections or
certain items (and, eventually, maybe API-powered saved searches) to
keep cached.
Another thing: ideally this client would work with groups too. Does it
need to do a full sync of all group data as well?
>> And I think some people would absolutely want this to work on their
>> iPhones as well. (I personally read much more on my iPhone than my
>> iPad.) Are you going to do a full library sync to an iPhone? Or can you
>> rely on the iPhone's network connection to pull down data as it's
>> needed, with selective caching?
> After the iPad client is done, making an alternative iPhone user
> interface should not be difficult. I changed the project to be a
> universal application, but currently it just crashes in iPhone before
> really doing anything. And selective caching will be implemented
OK, but I wasn't really talking about the user interface. When you say
"selective caching will be implemented", do you mean there would be an
on-demand browsing mode like I describe above, or something else?
> On 11/18/11 5:42 AM, Mikko Rönkkö wrote:
>> On Nov 18, 11:21 am, Dan Stillman<dstill...@zotero.org> wrote:
>>> On 11/18/11 2:49 AM, mronkko wrote:
>>>> 2) I have a large library (my library and group libraries combine are
>>>> around 10 000 items). What would be the best way to retrieve only
>>>> items that have been modified since the last succesful sync?
>>> items?order=dateModified
>> I assume that this would only return the items that are independent,
>> not the items that are attachements?
>
> It returns all items. items/top returns top-level items.
Then this will take care of getting information about the items that are added or modified.
>
>> Is an item marked as modified if an attachment to that item was modified?
>
> No. There's no public support for attachment files in the API right now. This will happen when attachment upload is supported.
I do not understand something here. I tested it and when I modify an attached file on my computer and then sync, the item for that attachment is marked as updated. (I am running the development version from a few weeks ago.)
I was thinking that the sqlite database is relatively small, so a local copy can be maintained. This also makes coding the app a bit easier (at least in this point). Also,
>
> Another thing: ideally this client would work with groups too. Does it need to do a full sync of all group data as well?
Groups are currently supported and work the same way as My Library.
>
>>> And I think some people would absolutely want this to work on their
>>> iPhones as well. (I personally read much more on my iPhone than my
>>> iPad.) Are you going to do a full library sync to an iPhone? Or can you
>>> rely on the iPhone's network connection to pull down data as it's
>>> needed, with selective caching?
>> After the iPad client is done, making an alternative iPhone user
>> interface should not be difficult. I changed the project to be a
>> universal application, but currently it just crashes in iPhone before
>> really doing anything. And selective caching will be implemented
>
> OK, but I wasn't really talking about the user interface. When you say "selective caching will be implemented", do you mean there would be an on-demand browsing mode like I describe above, or something else?
Ah, I see. I plan to keep a full copy of the zotero.sqlite locally and caching of attachments will be an option. While on demand browsing mode would work for small libraries, I see difficulties with that approach for larger libraries. For example, how would I search anything from my library that now contains 5600 items using the on demand model? Since the server api does not support searching, I would first need to cache all these items locally and then search from the cache. I think that maintaining a local copy of the library and keeping it synced is a more practical alternative for large libraries. What I might do is that instead of activating sync with a separate button, I would sync the collections as a user views them, which would be similar to the on-demand browsing mode from the user perspective.
Is it possible to identify deleted items through the server API?
The most recent version runs now on iPhone, but there are a couple of bugs that make it unusable.
Mikko
>
> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>
Ah, yeah, you're probably right. I think that we do this so that the
client knows when to check for modified files. (In other words, the item
itself doesn't provide the file's modtime right now, but it indicates
that there might be an updated modtime.)
>> Another thing: ideally this client would work with groups too. Does it need to do a full sync of all group data as well?
> Groups are currently supported and work the same way as My Library.
If someone is in ten groups, that could easily be quite a lot of data to
pull down and process each time they open the app, even if they don't
view the groups. This is why I'm suggesting more of an on-demand
approach by default.
>>>> And I think some people would absolutely want this to work on their
>>>> iPhones as well. (I personally read much more on my iPhone than my
>>>> iPad.) Are you going to do a full library sync to an iPhone? Or can you
>>>> rely on the iPhone's network connection to pull down data as it's
>>>> needed, with selective caching?
>>> After the iPad client is done, making an alternative iPhone user
>>> interface should not be difficult. I changed the project to be a
>>> universal application, but currently it just crashes in iPhone before
>>> really doing anything. And selective caching will be implemented
>> OK, but I wasn't really talking about the user interface. When you say "selective caching will be implemented", do you mean there would be an on-demand browsing mode like I describe above, or something else?
> Ah, I see. I plan to keep a full copy of the zotero.sqlite locally and caching of attachments will be an option. While on demand browsing mode would work for small libraries, I see difficulties with that approach for larger libraries. For example, how would I search anything from my library that now contains 5600 items using the on demand model? Since the server api does not support searching, I would first need to cache all these items locally and then search from the cache.
The API supports searching as of the beginning of this month. See the docs.
> I think that maintaining a local copy of the library and keeping it synced is a more practical alternative for large libraries. What I might do is that instead of activating sync with a separate button, I would sync the collections as a user views them, which would be similar to the on-demand browsing mode from the user perspective.
Yes, that's closer to what I'm suggesting. Mostly I'm just cautioning
against assuming that you need to do what the main client does and sync
all data in all libraries all the time.
> Is it possible to identify deleted items through the server API?
No, not currently. We might be able to offer a list of deleted items
for, say, the last two weeks, and if the client hasn't checked in since
before that it can re-pull. This is obviously less of an issue with
on-demand loading.
>
>>> Another thing: ideally this client would work with groups too. Does it need to do a full sync of all group data as well?
>> Groups are currently supported and work the same way as My Library.
>
> If someone is in ten groups, that could easily be quite a lot of data to pull down and process each time they open the app, even if they don't view the groups. This is why I'm suggesting more of an on-demand approach by default.
I was initially thinking that the sync would work the same way as with Zotero and that syncing would be primarily done over wifi. But I see the value of being able to sync on a more on-demand basis.
One option is to implement syncing with a more or less on-demand approach and then allow the option to upload a full database with iTunes to get a full sync. (And later possibly allow a full sync without iTunes upload.)
>
>>>>> And I think some people would absolutely want this to work on their
>>>>> iPhones as well. (I personally read much more on my iPhone than my
>>>>> iPad.) Are you going to do a full library sync to an iPhone? Or can you
>>>>> rely on the iPhone's network connection to pull down data as it's
>>>>> needed, with selective caching?
>>>> After the iPad client is done, making an alternative iPhone user
>>>> interface should not be difficult. I changed the project to be a
>>>> universal application, but currently it just crashes in iPhone before
>>>> really doing anything. And selective caching will be implemented
>>> OK, but I wasn't really talking about the user interface. When you say "selective caching will be implemented", do you mean there would be an on-demand browsing mode like I describe above, or something else?
>> Ah, I see. I plan to keep a full copy of the zotero.sqlite locally and caching of attachments will be an option. While on demand browsing mode would work for small libraries, I see difficulties with that approach for larger libraries. For example, how would I search anything from my library that now contains 5600 items using the on demand model? Since the server api does not support searching, I would first need to cache all these items locally and then search from the cache.
>
> The API supports searching as of the beginning of this month. See the docs.
I checked this document page and did not find information about syncing except for searching tags and item types. Is there somewhere else I should look?
http://www.zotero.org/support/dev/server_api/read_api
>
>> I think that maintaining a local copy of the library and keeping it synced is a more practical alternative for large libraries. What I might do is that instead of activating sync with a separate button, I would sync the collections as a user views them, which would be similar to the on-demand browsing mode from the user perspective.
>
> Yes, that's closer to what I'm suggesting. Mostly I'm just cautioning against assuming that you need to do what the main client does and sync all data in all libraries all the time.
I thought that this would be the easiest approach, but I seem to have been wrong. However, for the traveling use case some kind of full sync needs to be included. But initially this can be with iTunes.
>
>> Is it possible to identify deleted items through the server API?
>
> No, not currently. We might be able to offer a list of deleted items for, say, the last two weeks, and if the client hasn't checked in since before that it can re-pull. This is obviously less of an issue with on-demand loading.
How long does the Zotero server store this information? Is this tied to when the clients have synced most recently?
One way to identify deleted items would be to just include an API call that would just return all the item IDs that are in use without providing any other data for these items. This data can then be used to delete items from the local database.
Mikko
See the 'q' parameter. Other fields will be added down the line.
>>> Is it possible to identify deleted items through the server API?
>> No, not currently. We might be able to offer a list of deleted items for, say, the last two weeks, and if the client hasn't checked in since before that it can re-pull. This is obviously less of an issue with on-demand loading.
> How long does the Zotero server store this information? Is this tied to when the clients have synced most recently?
It stores it permanently now, but that needs to be changed, since
extensive delete histories can cause sync problems, particularly with
large libraries. But the Zotero client itself currently lacks a
mechanism for dealing with a purged delete history.
> One way to identify deleted items would be to just include an API call that would just return all the item IDs that are in use without providing any other data for these items. This data can then be used to delete items from the local database.
I was thinking yesterday that a 'format' mode that returned just item
keys or URIs (itemIDs are not global identifiers) would probably be
useful for a number of things, such as getting the full contents of a
collection without retrieving item details that the client already has.
(content=none helps a little with this, but just keys/URIs would be much
smaller and faster, and we might be able to return those without any
limits at all.) And there's actually an undocumented way to get data for
an arbitrary set of keys that we can probably expose.
It still might be good to provide limited deletion history, though,
which would be more efficient for both the server and client. Need to
think about this more.
I would be quite interested in having both of these for Zandy as well.
Avram
> On 11/18/11 4:05 PM, Rönkkö Mikko wrote:
>>> The API supports searching as of the beginning of this month. See the docs.
>> I checked this document page and did not find information about syncing except for searching tags and item types. Is there somewhere else I should look?
>>
>> http://www.zotero.org/support/dev/server_api/read_api
>
> See the 'q' parameter. Other fields will be added down the line.
How did I miss that? This will do be sufficient searching for now, and this is what I actually thought that the search in ZotPad would search.
>
>>>> Is it possible to identify deleted items through the server API?
>>> No, not currently. We might be able to offer a list of deleted items for, say, the last two weeks, and if the client hasn't checked in since before that it can re-pull. This is obviously less of an issue with on-demand loading.
>> How long does the Zotero server store this information? Is this tied to when the clients have synced most recently?
>
> It stores it permanently now, but that needs to be changed, since extensive delete histories can cause sync problems, particularly with large libraries. But the Zotero client itself currently lacks a mechanism for dealing with a purged delete history.
>
>> One way to identify deleted items would be to just include an API call that would just return all the item IDs that are in use without providing any other data for these items. This data can then be used to delete items from the local database.
>
> I was thinking yesterday that a 'format' mode that returned just item keys or URIs (itemIDs are not global identifiers) would probably be useful for a number of things, such as getting the full contents of a collection without retrieving item details that the client already has. (content=none helps a little with this, but just keys/URIs would be much smaller and faster, and we might be able to return those without any limits at all.) And there's actually an undocumented way to get data for an arbitrary set of keys that we can probably expose.
Both of these would be great. The UI library that I use for ZotPad needs the item count (and in practice also item IDs) when a view is shown but it loads cell content only as needed.
>
> It still might be good to provide limited deletion history, though, which would be more efficient for both the server and client. Need to think about this more.
This would be useful. If the item deletion history is stored indefinitely, it would be useful if the APi would provide a list of deleted items since a time stamp that it is given as a part of the API request.
Mikko
You don't need this for an item count. All API responses include the
total result count in zapi:totalResults.
>> I was thinking yesterday that a 'format' mode that returned just item keys
>> or URIs (itemIDs are not global identifiers) would probably be useful for a
>> number of things, such as getting the full contents of a collection without
>> retrieving item details that the client already has. (content=none helps a
>> little with this, but just keys/URIs would be much smaller and faster, and
>> we might be able to return those without any limits at all.) And there's
>> actually an undocumented way to get data for an arbitrary set of keys that
>> we can probably expose.
These would be very useful features to have. I would like to know if it is possible to have these features available in the near future, because it affects a couple of architectural decisions in ZotPad. Having these two features would make item cacheing much more efficient.
>>
>> It still might be good to provide limited deletion history, though, which
>> would be more efficient for both the server and client. Need to think about
>> this more.
>
> I would be quite interested in having both of these for Zandy as well.
Also this would be useful, but in the case of ZotPad, having this would not affect the cacheing architecture as much.
Mikko
I didn't mean on its own. The 'limit' parameter still applies.
> I've been working separately on a project that uses the read API, and
> I'm trying to emulate the 'resumption token' facility of OAI feeds so
> I don't waste time re-reading records that haven't been updated. What
> I'd like is to be able to use something like 'order=dateModified
> ASC&modifiedSince=2011-01-01' as parameters in my search. Is there any
> facility to filter by date range at the moment [I know there's nothing
> in the Read API documentation] or should that go as a feature request?
No, but see this discussion:
http://groups.google.com/group/zotero-dev/msg/51c4de26c27c1a39
You should never need to read more than [limit] extra items.
I think we'll be able to offer a newline-separated format=keys mode and
selective itemKey-based pulling within the next few days.
Are there any updates on this? If it is coming in a day or two, I will wait for that before releasing ZotPad. If it will be a few weeks, then I will implement it in the next version of ZotPad.
I posted some screenshots with descriptions here:
http://sblsrv.org.aalto.fi/zotpad/
We should be rolling out format=keys later today.
How about the selective itemKey-based pulling and multi-format response (https://github.com/zotero/dataserver/commit/cfc9015f11c09e0b8194a9c6e301cf07b8f731e3)?
All now live on api.zotero.org. I've updated the read API docs with
details. Let me know if you have any questions or run into any problems.
The documentation is very clear and explains everything that I need to know.
Just one question: How would you compare the performance of retrieving 3000 items with bib and json content using the following two methods
1) First retrieving all keys from the library and then going over this list 50 items at a time and using the key-based retrieval
2) Retrieving all items from the library 50 items at a time by using the offset-option?
I am asking this because I am going to mainly use the option 1 because I can then use the cache more efficiently. The question is if a potential performance gain justifies the additional complexity of switching to a different item retrieval mode when cache is either empty or not used.
Mikko
We haven't done extensive testing on itemKey yet, but I would expect it
to be as fast as or faster than the offset approach, since it uses an
index to retrieve just the specified items rather than finding
potentially thousands of rows, sorting them, and returning a subset.
Right now there's a limit of a little over 50 items on 'itemKey', but
that's actually a function of a security-related limitation on variable
length rather than a hard-coded limit. We might be able to increase that
to 100 for consistency with the 'limit' parameter.
This sounds really good.
One more thing that would be useful was retrieving the set of items that have changed after a timestamp. Currently the way to get modified items is to retrieve items until hitting an item that is already up to date in the cache. Doing this check in the server end would save a little bit of processing time on the server and the client and also a bit of bandwidth.
Mikko