<updated>2011-06-29T14:29:32Z</updated>
(From https://api.zotero.org/users/475425/collections?pprint=1)
What does this updated-field mean?
For example, If I retrieve collection members, does it mean
-The last time the collection itself was updated?
-The last time an item was added or removed from the collection?
Or something else?
Mikko
Both.
The update field for groups can be retrieved using the URI /users/<userID>/groups
Is there a way to retrieve updated field for My Library? (Without retrieving a list of items from the library root and checking it from there.)
Mikko
>
> On Dec 18, 2011, at 02:27, Dan Stillman wrote:
>
>> On 12/17/11 5:18 AM, Rönkkö Mikko wrote:
>>> Almost every request to read API returns a response that contains element called updated
>>>
>>> <updated>2011-06-29T14:29:32Z</updated>
>>>
>>> (From https://api.zotero.org/users/475425/collections?pprint=1)
>>>
>>> What does this updated-field mean?
>>>
>>> For example, If I retrieve collection members, does it mean
>>> -The last time the collection itself was updated?
>>> -The last time an item was added or removed from the collection?
>>>
>>> Or something else?
>>
>> Both.
>>
>
> The update field for groups can be retrieved using the URI /users/<userID>/groups
It seem that the update field for groups does not tell when something in that group library was updated. We have a fairly active group that still has the update timestamp set to last January
<updated>2011-01-13T10:35:27Z</updated>
What information does the update field for group store if not when the content of the group was last updated?
Mikko
A follow up question is that what does the update field in an item list tell? It seems to change depending on the offset
wget -O- --no-check-certificate https://api.zotero.org/groups/6184/items/top?key=XXXXXXXXXXXXXXXXXXXXXXXX\&format=atom\&content=bib\&style=apa\&limit=50\&start=5
<updated>2011-10-14T12:11:38Z</updated>
wget -O- --no-check-certificate https://api.zotero.org/groups/6184/items/top?key=XXXXXXXXXXXXXXXXXXXXXXXX\&format=atom\&content=bib\&style=apa\&limit=50\&start=0
<updated>2011-12-09T08:18:09Z</updated>
Is there a reliable way to determine that an item list has not changed on the server while it is being retrieved?
Consider the following example: A library contains 100 entries and I retrieve data 50 items at a time ordering in descending order by modification time:
1) Retrieve first 50 items
2) Someone edits an item that was not included in the initial list of items and syncs with the server
2) Retrieve items 51-100
I assume that in this case, the last item in the first retrieval and the first item in the second retrieval would be the same because editing an item would make that item appear first in the list. If I join these two item lists, this would result in the edited item missing completely and one item being a duplicate.
What I would like to do in this case is to start retrieving the list again from the first index.
Mikko
All it indicates is when the group metadata (e.g., title) was last
changed. Nothing to do with the group library, which is why there's no
equivalent timestamp for the personal library.
There actually is a timestamp that's updated when any library data
changes, but it's not exposed by the API. We can think more about this,
but it might not be necessary to expose that as long as we make a few
more actions (e.g., collection-item changes) update item timestamps.
Then the first result of an items request for a library sorted by date
modified descending would always indicate whether library data had changed.
The feed updated timestamp is just equal to the most recent timestamp in
the visible results. I don't think it makes sense for it to be anything
else, when you consider all the kinds of requests (searches,
non-date-based ordering) that could be made against the API.
> Is there a reliable way to determine that an item list has not changed on the server while it is being retrieved?
>
> Consider the following example: A library contains 100 entries and I retrieve data 50 items at a time ordering in descending order by modification time:
> 1) Retrieve first 50 items
> 2) Someone edits an item that was not included in the initial list of items and syncs with the server
> 2) Retrieve items 51-100
>
> I assume that in this case, the last item in the first retrieval and the first item in the second retrieval would be the same because editing an item would make that item appear first in the list. If I join these two item lists, this would result in the edited item missing completely and one item being a duplicate.
That sounds right.
> What I would like to do in this case is to start retrieving the list again from the first index.
Yes, but you could process to the end (or as far as you were going to
go) first, potentially ignoring a couple duplicates at the beginning of
each set, and then at the end return to the beginning and process until
you hit the first key/timestamp combo that you already have.
> On 12/19/11 5:42 AM, Rönkkö Mikko wrote:
>> A follow up question is that what does the update field in an item list tell? It seems to change depending on the offset
>>
>> wget -O- --no-check-certificate https://api.zotero.org/groups/6184/items/top?key=XXXXXXXXXXXXXXXXXXXXXXXX\&format=atom\&content=bib\&style=apa\&limit=50\&start=5
>>
>> <updated>2011-10-14T12:11:38Z</updated>
>>
>>
>> wget -O- --no-check-certificate https://api.zotero.org/groups/6184/items/top?key=XXXXXXXXXXXXXXXXXXXXXXXX\&format=atom\&content=bib\&style=apa\&limit=50\&start=0
>>
>> <updated>2011-12-09T08:18:09Z</updated>
>
> The feed updated timestamp is just equal to the most recent timestamp in the visible results. I don't think it makes sense for it to be anything else, when you consider all the kinds of requests (searches, non-date-based ordering) that could be made against the API.
In my use case it would make sense. An example: I first retrieve a collection based on the default order so that most recently edited items come first and write this to cache. Then I want to have the same collection sorted by author. Now there is no way of knowing if the data on the server have changed while I am retrieving the new data. If the updated timestamp reflected the time the collection was modified, I could stop retrieving the sorted data after the first set of results and completely rely on cache.
>
>> Is there a reliable way to determine that an item list has not changed on the server while it is being retrieved?
>>
>> Consider the following example: A library contains 100 entries and I retrieve data 50 items at a time ordering in descending order by modification time:
>> 1) Retrieve first 50 items
>> 2) Someone edits an item that was not included in the initial list of items and syncs with the server
>> 2) Retrieve items 51-100
>>
>> I assume that in this case, the last item in the first retrieval and the first item in the second retrieval would be the same because editing an item would make that item appear first in the list. If I join these two item lists, this would result in the edited item missing completely and one item being a duplicate.
>
> That sounds right.
>
>> What I would like to do in this case is to start retrieving the list again from the first index.
>
> Yes, but you could process to the end (or as far as you were going to go) first, potentially ignoring a couple duplicates at the beginning of each set, and then at the end return to the beginning and process until you hit the first key/timestamp combo that you already have.
This is what I am doing now. The problem with this approach is that for large views (e.g root of my library is around 3500 items), scanning for duplicate keys is an expensive operation considering the limited processing power of iPad. If it was possible to check the consistency of the results by comparing time stamps, this would be much more efficient.
Mikko
>
> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>
I don't really understand what you're doing here or what timestamp you
think this should be displaying. Can you explain again?
Among other things, what's doing the sorting by author? Your client or
the API?
>>> Is there a reliable way to determine that an item list has not changed on the server while it is being retrieved?
>>>
>>> Consider the following example: A library contains 100 entries and I retrieve data 50 items at a time ordering in descending order by modification time:
>>> 1) Retrieve first 50 items
>>> 2) Someone edits an item that was not included in the initial list of items and syncs with the server
>>> 2) Retrieve items 51-100
>>>
>>> I assume that in this case, the last item in the first retrieval and the first item in the second retrieval would be the same because editing an item would make that item appear first in the list. If I join these two item lists, this would result in the edited item missing completely and one item being a duplicate.
>> That sounds right.
>>
>>> What I would like to do in this case is to start retrieving the list again from the first index.
>> Yes, but you could process to the end (or as far as you were going to go) first, potentially ignoring a couple duplicates at the beginning of each set, and then at the end return to the beginning and process until you hit the first key/timestamp combo that you already have.
> This is what I am doing now. The problem with this approach is that for large views (e.g root of my library is around 3500 items), scanning for duplicate keys is an expensive operation considering the limited processing power of iPad. If it was possible to check the consistency of the results by comparing time stamps, this would be much more efficient.
I have a hard time imagining that you can't optimize this. A check for a
cached library/key/timestamp seems pretty integral to any sort of
caching model.
> On 12/20/11 11:43 PM, Rönkkö Mikko wrote:
>> On Dec 20, 2011, at 23:44, Dan Stillman wrote:
>>> The feed updated timestamp is just equal to the most recent timestamp in the visible results. I don't think it makes sense for it to be anything else, when you consider all the kinds of requests (searches, non-date-based ordering) that could be made against the API.
>> In my use case it would make sense. An example: I first retrieve a collection based on the default order so that most recently edited items come first and write this to cache. Then I want to have the same collection sorted by author. Now there is no way of knowing if the data on the server have changed while I am retrieving the new data. If the updated timestamp reflected the time the collection was modified, I could stop retrieving the sorted data after the first set of results and completely rely on cache.
>
> I don't really understand what you're doing here or what timestamp you think this should be displaying. Can you explain again?
If I retrieve items in a collection, I would expect the time stamp to reflect the last time an item was added or removed from the collection or an item that was included in the collection was changed. (Having separate time stamps for item modification and collection membership modification might be useful)
>
> Among other things, what's doing the sorting by author? Your client or the API?
What I am doing now is that I write almost everything that I receive from the server into cache. After retrieving a complete collection, I mark this collection as completely cached in the local database and store the time stamp of the most recently modified item of that collection and the number of items that I got.
When a user taps the button that sorts the currently listed items by author, things happen in the following order
1) I retrieve a sorted item list from the cache and show it to the user
2) I start retrieving a sorted item list 50 items at a time from the server
3) When data comes in from the server, the items in the user interface are updated as soon as data about them becomes available.
The issue here is that if the cache is up to date, I want to stop retrieving the data from the server as soon as I have determined that this is the case. Of course, I could always do one extra API call to request the number of items in the collection and the most recently edited item and then decide if I should retrieve the item list from the server. The problem is that if the data on the server was changed, this would delay getting refreshed data. It would be better if I could retrieve the first 50 items and determine if the cache is still valid with just one API request.
>
>>>> Is there a reliable way to determine that an item list has not changed on the server while it is being retrieved?
>>>>
>>>> Consider the following example: A library contains 100 entries and I retrieve data 50 items at a time ordering in descending order by modification time:
>>>> 1) Retrieve first 50 items
>>>> 2) Someone edits an item that was not included in the initial list of items and syncs with the server
>>>> 2) Retrieve items 51-100
>>>>
>>>> I assume that in this case, the last item in the first retrieval and the first item in the second retrieval would be the same because editing an item would make that item appear first in the list. If I join these two item lists, this would result in the edited item missing completely and one item being a duplicate.
>>> That sounds right.
>>>
>>>> What I would like to do in this case is to start retrieving the list again from the first index.
>>> Yes, but you could process to the end (or as far as you were going to go) first, potentially ignoring a couple duplicates at the beginning of each set, and then at the end return to the beginning and process until you hit the first key/timestamp combo that you already have.
>> This is what I am doing now. The problem with this approach is that for large views (e.g root of my library is around 3500 items), scanning for duplicate keys is an expensive operation considering the limited processing power of iPad. If it was possible to check the consistency of the results by comparing time stamps, this would be much more efficient.
>
> I have a hard time imagining that you can't optimize this. A check for a cached library/key/timestamp seems pretty integral to any sort of caching model.
For technical reasons, I need to keep the item keys in an array. The options for checking duplicates would be to do a sequential scan over this array every time I receive an item key from the server or to have a parallel hash map for duplicate checking. The first alternative is very inefficient and second is not attractive either because it adds complexity to the code. (I have another developer also working on the code and the cache code is already difficult to understand as it is.) At no point, I cannot have duplicate keys in the array that contains the visible item keys because this will cause a crash or result in duplicate items shown to user.
But a request could just as easily include, say, a keyword search (?q=),
in which case a timestamp like you're describing doesn't make sense.
(There's also not even really a way for the server to provide it,
because the items that match the search could have changed since the
last time the search was run, making any given timestamp meaningless.)
> (Having separate time stamps for item modification and collection membership modification might be useful)
Collection membership modification is part of the
/collections/<collectionKey> timestamp.
>> Among other things, what's doing the sorting by author? Your client or the API?
> What I am doing now is that I write almost everything that I receive from the server into cache. After retrieving a complete collection, I mark this collection as completely cached in the local database and store the time stamp of the most recently modified item of that collection and the number of items that I got.
>
> When a user taps the button that sorts the currently listed items by author, things happen in the following order
> 1) I retrieve a sorted item list from the cache and show it to the user
> 2) I start retrieving a sorted item list 50 items at a time from the server
> 3) When data comes in from the server, the items in the user interface are updated as soon as data about them becomes available.
>
> The issue here is that if the cache is up to date, I want to stop retrieving the data from the server as soon as I have determined that this is the case. Of course, I could always do one extra API call to request the number of items in the collection and the most recently edited item and then decide if I should retrieve the item list from the server. The problem is that if the data on the server was changed, this would delay getting refreshed data. It would be better if I could retrieve the first 50 items and determine if the cache is still valid with just one API request.
The extra call doesn't seem like too big a deal, particularly since
you'll already be showing cached results. This would be
limit=1&order=dateModified&content=none, right? That's taking about a
second for me right now on a small library. We should be able to make it
faster.
I'd say you would also need /collections/<collectionKey> in case items
had been removed, but format=keys will probably help you out there.
> On 12/22/11 2:00 AM, Rönkkö Mikko wrote:
>> On Dec 22, 2011, at 07:13, Dan Stillman wrote:
>>> On 12/20/11 11:43 PM, Rönkkö Mikko wrote:
>>>> On Dec 20, 2011, at 23:44, Dan Stillman wrote:
>>>>> The feed updated timestamp is just equal to the most recent timestamp in the visible results. I don't think it makes sense for it to be anything else, when you consider all the kinds of requests (searches, non-date-based ordering) that could be made against the API.
>>>> In my use case it would make sense. An example: I first retrieve a collection based on the default order so that most recently edited items come first and write this to cache. Then I want to have the same collection sorted by author. Now there is no way of knowing if the data on the server have changed while I am retrieving the new data. If the updated timestamp reflected the time the collection was modified, I could stop retrieving the sorted data after the first set of results and completely rely on cache.
>>> I don't really understand what you're doing here or what timestamp you think this should be displaying. Can you explain again?
>> If I retrieve items in a collection, I would expect the time stamp to reflect the last time an item was added or removed from the collection or an item that was included in the collection was changed.
>
> But a request could just as easily include, say, a keyword search (?q=), in which case a timestamp like you're describing doesn't make sense. (There's also not even really a way for the server to provide it, because the items that match the search could have changed since the last time the search was run, making any given timestamp meaningless.)
I think that it would make sense even in that case. If an item in a collection was changed resulting in the results of a keyword search changing, wouldn't it also update the collection level timestamp?
This would make it very easy to use the timestamp for determining when a cache needs to be refreshed: If the timestamp in the server response was the same that I have in cache, I would know that the cache is up to data. If the time stamp was different, I would know that I would need to update the cache because an item was either added or removed or changed in the collection.
>
>> (Having separate time stamps for item modification and collection membership modification might be useful)
>
> Collection membership modification is part of the /collections/<collectionKey> timestamp.
This is what I am currently using: First retrieving the collection time stamp and if it differs from cache, then retrieving a list of items. What I wanted to have is to get these in a single query, but I am starting to think that this is probably unnecessary optimization after all.
>
>>> Among other things, what's doing the sorting by author? Your client or the API?
>> What I am doing now is that I write almost everything that I receive from the server into cache. After retrieving a complete collection, I mark this collection as completely cached in the local database and store the time stamp of the most recently modified item of that collection and the number of items that I got.
>>
>> When a user taps the button that sorts the currently listed items by author, things happen in the following order
>> 1) I retrieve a sorted item list from the cache and show it to the user
>> 2) I start retrieving a sorted item list 50 items at a time from the server
>> 3) When data comes in from the server, the items in the user interface are updated as soon as data about them becomes available.
>>
>> The issue here is that if the cache is up to date, I want to stop retrieving the data from the server as soon as I have determined that this is the case. Of course, I could always do one extra API call to request the number of items in the collection and the most recently edited item and then decide if I should retrieve the item list from the server. The problem is that if the data on the server was changed, this would delay getting refreshed data. It would be better if I could retrieve the first 50 items and determine if the cache is still valid with just one API request.
>
> The extra call doesn't seem like too big a deal, particularly since you'll already be showing cached results. This would be limit=1&order=dateModified&content=none, right? That's taking about a second for me right now on a small library. We should be able to make it faster.
That is what I am currently doing.
>
> I'd say you would also need /collections/<collectionKey> in case items had been removed, but format=keys will probably help you out there.
Definitely. Format=keys, particularly if I can get all keys with a single call even for very large libraries, would be a huge helper.
Mikko
I guess one could argue that, but to me, this is a feed timestamp, and a
feed comprises a particular set of parameters.
It's irrelevant, though, because items changing don't update any
timestamp other than their own and the library's. We don't calculate or
store the kind of collection-level timestamp you're asking for, so the
only timestamp that would exist would be at request time, and it would
be expensive to calculate.
Also, this isn't just about collections. The feed timestamp is a
universal feature of all format=atom requests for all objects. The only
consistent thing that makes sense across all API requests is for it to
concern the current set of results. That might not make it particularly
useful for a lot of consumers, but that doesn't really matter. It's
there mainly because it's in the Atom spec.
And consumers that want to know the most recent item have another way to
get that.
>
> It's irrelevant, though, because items changing don't update any timestamp other than their own and the library's. We don't calculate or store the kind of collection-level timestamp you're asking for, so the only timestamp that would exist would be at request time, and it would be expensive to calculate.
I thought that the collections timestamp is updated when an item inside a collection is updated, but seems that I was incorrect.
>
> Also, this isn't just about collections. The feed timestamp is a universal feature of all format=atom requests for all objects. The only consistent thing that makes sense across all API requests is for it to concern the current set of results. That might not make it particularly useful for a lot of consumers, but that doesn't really matter. It's there mainly because it's in the Atom spec.
>
> And consumers that want to know the most recent item have another way to get that.
Makes sense.
I think that the main conclusion about this exchange is that the updated field would need to be better documented in the read API documentation so that users of the API do not attempt to use the updated field for things that it is not meant to be used for.
"4.2.15. The "atom:updated" Element
The "atom:updated" element is a Date construct indicating the most
recent instant in time when an entry or feed was modified in a way
the publisher considers significant. Therefore, not all
modifications necessarily result in a changed atom:updated value."
If the updated timestamp would contain either the updated timestamp for the collection or the updated timestamp for the most recently updated items, it would be compatible with the atom standard and more useful for clients.
Mikko
In other words, if the most recently edited item in a library has a timestamp
<updated>2012-01-01T22:26:36Z</updated>
And I have a new item on my client that I added on the last last day of 2011 and synchronize now, is the timestamp for that item on the server 2012-01-16T19:15:00Z or 2011-12-30T22:26:36Z?
Mikko
It's currently dateModified, but serverDateModified might make more sense.
Is it possible to get serverDateModified through the API?
No. But actually, if we changed this then dateModified wouldn't be
available anywhere else. (I was thinking it was in the JSON, but it's
not, of course, since it's not editable data.)
I guess the ability to sort by and view serverDateModified would be helpful?
Yes. This would be really helpful in keeping a local cache of items. Also limiting the items in API request to items that have the serverDateModified after some value would be helpful.
I am currently determining which items to retrieve to the cache by retrieving all the keys from the server ordered by dateModified. Then I get the key with the most recent dateModified from the cache and retrieve item details for the keys that are before this key in the list that I retrieved from the server. This works well in most cases, but there are cases that this algorithm cannot handle resulting in inconsistent cache.
Mikko
Is there any reason you need to view that value, or would sorting by it
be enough?
I want to retrieve only the items that have changed on the server. Having this variable available in the client will help in determining which items to retrieve.
But it is easy to do a workaround where I just use the dataModified variable instead when determining how far an item key list should be retrieved, so this is not a big issue. It just adds a bit of complexity to the client code and will result in unnecessarily retrieving of a small number of items.