OK, it took us longer than expected, but I think we've now got
convergence on the AITC API.
https://wiki.mozilla.org/Apps/AITC
The device API is there for completeness, but we agree that it is *not*
a June requirement.
Let's discuss on #aitc or on this email thread.
-Ben
_______________________________________________
Services-dev mailing list
Servic...@mozilla.org
https://mail.mozilla.org/listinfo/services-dev
Specific changes I would like to discuss/apply:
1) Remove any/all auth specifications from this spec, instead saying that clients should follow the Sagrada Token model (the same thing, but with a shared discovery/auth flow).
This is outlined at https://wiki.mozilla.org/Services/Sagrada/ServiceClientFlow (most relevant is the Access section) and is exactly the same details, except for discovery. I would like to use a standard model for all Services-hosted services, and as the protocols are effectively the same, we should not be duplicating documentation (and potentially diverging from the same model).
2) App Records must never be deleted (remove DELETE from Apps API)
Given that this data is extrememly high value, and cannot be lost, I do not believe that the API should allow direct deletion of this data. If I purchase a $50 app, my expectation is that I will continue to be able to use this app in the future, even if I uninstall it from all of my devices at some point. From the discussion a few weeks back, I was suprised to see that deletion of these records was in the API.
It is my understanding from Ian that this is present as a reflection of the DOM API [1]. From reading that spec, my assumption would be that "installed" is a per-client state, and uninstall should never result in deletion of App records, most importantly the receipts. Assuming this is true, the app.uninstall() call should modify the Device State record, and not the App record.
Assuming we adopt the previous point:
3) Remove the "partial" or "abbreviated" app record options from the API, and only support requests for "all app records" and "all app records added+changed since last update."
If Apps are never deleted, this is a purely additive (but still canonical!) store. At that point, there is no need to refetch data that was already downloaded, making the vast majority of requests smaller and faster. If a client needs all app records (new setup, local cache lost, etc), they can simply fetch them all with a single request.
In addition, by removing the need to return multiple data formats, you can specify a transport format that the server doesn't need to understand or parse. Which brings up my next comment:
4) Adopt a metadata+payload format for transport.
Right now the format is flat, which means the server has to have specific awareness of the data format, in order to reconstruct a record for a client. I would strongly prefer to see a transport format that focuses on a separation of payload and metadata, and then treat the payload as opaque to the server. This means that the Apps team ends up with much more flexibility to add features down the road, without requiring server changes.
What we have now is:
{
origin: "https://example.com",
manifestPath: "/manifest.webapp",
installOrigin: "https://marketplace.mozilla.org",
installedTime: 1330535996745,
modificationTime: 1330535996945,
receipts: ["...", "..."]
}
I would like to see something more like this, where "payload" is a Apps-defined format that the server simply stores and returns with the server-side metadata:
{
appid: aHR0cHM6Ly9leGFtcGxlLmNvbQ, // base64 conversion of the origin
modified: 1330535996945, // server-set (server is canonical!)
payload: {"origin":"https://example.com","manifestPath":"/manifest.webapp","installOrigin":"https://marketplace.mozilla.org","installedTime":1330535996745,"receipts":["...","..."]}
// this is a JSON-encoded string for the remaining payload, and is stored as-is by the server
}
The key advantage here is forward compatibility without any requirement for server changes, and a much more straightforward parsing/storage model.
Thoughts?
-- Mike
[1] https://developer.mozilla.org/en/Apps/Apps_JavaScript_API
-- Mike
FYI, "really forget this" is looking like a policy requirement from EU
guidelines about the right to be forgotten. So it's a use case that I
would argue the services platform should support, and let the product
folks figure out when/how to call it.
> On Mar 07 '12 3:03 PM, Mike Connor wrote:
>> 2) App Records must never be deleted (remove DELETE from Apps API)
>>
>> Given that this data is extrememly high value, and cannot be lost, I do not believe that the API should allow direct deletion of this data. If I purchase a $50 app, my expectation is that I will continue to be able to use this app in the future, even if I uninstall it from all of my devices at some point. From the discussion a few weeks back, I was suprised to see that deletion of these records was in the API.
>
> FYI, "really forget this" is looking like a policy requirement from EU guidelines about the right to be forgotten. So it's a use case that I would argue the services platform should support, and let the product folks figure out when/how to call it.
>
Is that for individual pieces of data? Or for "forget me as a user"? In the latter case, having an explicit "delete all of my data" call is a good thing. Sync has this now (https://account.services.mozilla.com/account/delete) and I intend to keep this even in a BID-centric world, but I'd be curious to know if there's a more fine-grained requirement.
-- Mike
We need both.
>> Is that for individual pieces of data? Or for "forget me as a user"?
>
> We need both.
Links would be good, because I think there's a gap in how Sync works and what you're saying. If there are any other implications of that, I'd love to be in those loops.
-- Mike
I don't have much in the way of links, sorry.
Data Safety review asked for the right for users to really forget an
app purchase. That should be in the Data Safety minutes, but I don't
think you'll find much more detail.
EU has made not-yet-official moves towards a right to be forgotten,
details are AFAIK not existing yet.
--da
> On Mar 7, 2012, at 3:05 PM, David Ascher wrote:
>> On Mar 07 '12 3:03 PM, Mike Connor wrote:
>>> 2) App Records must never be deleted (remove DELETE from Apps API)
>>>
>>> Given that this data is extrememly high value, and cannot be lost, I do not believe that the API should allow direct deletion of this data. If I purchase a $50 app, my expectation is that I will continue to be able to use this app in the future, even if I uninstall it from all of my devices at some point. From the discussion a few weeks back, I was suprised to see that deletion of these records was in the API.
>>
>> FYI, "really forget this" is looking like a policy requirement from EU guidelines about the right to be forgotten. So it's a use case that I would argue the services platform should support, and let the product folks figure out when/how to call it.
>
> Agreed, Ben originally added the DELETE call for user privacy reasons, i.e. the user explicitly wants the server to "forget" that they installed a certain app. It is certainly not meant to be called when the uninstall() API method is invoked by the dashboard or anyone else.
Clarification is good! I'm not sure how you guys plan to expose this, since most of the UX/docs I've seen has been very much focused on per-device state, but I can understand the desire. I think it's a relatively rare case, but I am down with the desire to provide this feature.
A followup question: in a world where we have Device State as a server-side store, is removing the app from all devices and removing the record from the Apps store on the server sufficient for the requirement?
-- Mike
> A followup question: in a world where we have Device State as a server-side store, is removing the app from all devices and removing the record from the Apps store on the server sufficient for the requirement?
What else could be implied? (In general, there are likely long term
complicated things like log expunging, etc., but I expect that's not
services-specific).
That's a nice symmetry, and potentially useful in environments in which the same code is syncing apps (this protocol) and the rest of your data (Sync protocol).
-----Original Message-----
From: Mike Connor [mco...@mozilla.com]
Received: Wednesday, 07 Mar 2012, 3:03pm
To: Ben Adida [bena...@mozilla.com]
CC: Bill Walker [bwa...@mozilla.com]; Anant Narayanan [an...@mozilla.com]; servic...@mozilla.org
Subject: Re: AITC API
I've ignored the Device State elements in this response, mostly since they're not blocking launch.
Specific changes I would like to discuss/apply:
1) Remove any/all auth specifications from this spec, instead saying that clients should follow the Sagrada Token model (the same thing, but with a shared discovery/auth flow).
This is outlined at https://wiki.mozilla.org/Services/Sagrada/ServiceClientFlow (most relevant is the Access section) and is exactly the same details, except for discovery. I would like to use a standard model for all Services-hosted services, and as the protocols are effectively the same, we should not be duplicating documentation (and potentially diverging from the same model).
2) App Records must never be deleted (remove DELETE from Apps API)
Given that this data is extrememly high value, and cannot be lost, I do not believe that the API should allow direct deletion of this data. If I purchase a $50 app, my expectation is that I will continue to be able to use this app in the future, even if I uninstall it from all of my devices at some point. From the discussion a few weeks back, I was suprised to see that deletion of these records was in the API.
It is my understanding from Ian that this is present as a reflection of the DOM API [1]. From reading that spec, my assumption would be that "installed" is a per-client state, and uninstall should never result in deletion of App records, most importantly the receipts. Assuming this is true, the app.uninstall() call should modify the Device State record, and not the App record.
Thoughts?
-- Mike
[1] https://developer.mozilla.org/en/Apps/Apps_JavaScript_API
-- Mike
Agreed, Ben originally added the DELETE call for user privacy reasons, i.e. the user explicitly wants the server to "forget" that they installed a certain app. It is certainly not meant to be called when the uninstall() API method is invoked by the dashboard or anyone else.
-Anant
Hi folks,
Mike and I had a chat today and I think we're converging.
> 1) Remove any/all auth specifications from this spec
Agreed on referencing, though I would prefer to reference:
https://wiki.mozilla.org/Identity/BrowserIDSync#BrowserID_.2B_REST
Specifically, I think we should *not* reference Mozilla specific
hostnames or implementation specifics (e.g. token server). We also
should not mention discovery here. The AITC protocol begins at "you have
an AITC endpoint."
> 2) App Records must never be deleted (remove DELETE from Apps API)
We've agreed that this is a requirement and that the contract is "be
careful, once it's gone it's gone."
> 3) Remove the "partial" or
> "abbreviated" app record options from the API, and only support
We've agreed that we need this feature. Mike correctly pointed out that
(a) we may need to revisit the exact fields included in "abbreviated"
depending on when we think we'll use that call.
(b) it will be worth tracking the last addition time and last deletion
time in the user account for potential future optimizations.
> 4) Adopt a metadata+payload format for transport.
>
> Right now the format is flat, which means the server has to have
> specific awareness of the data format
We didn't specifically discuss the payload, but I feel fairly strongly
about an API that does JSON natively, without enveloping. See for example:
https://graph.facebook.com/search?q=watermelon&type=post
All that said, I believe the spec here:
https://wiki.mozilla.org/Apps/AITC
is ready enough to begin implementing. We may tweak the description of
auth, as mentioned above, but that shouldn't change the implementation
path much, if at all.
On 2012-03-08, at 3:47 PM, Ben Adida wrote:
> Hi folks,
>
> Mike and I had a chat today and I think we're converging.
>
>> 1) Remove any/all auth specifications from this spec
>
> Agreed on referencing, though I would prefer to reference:
>
> https://wiki.mozilla.org/Identity/BrowserIDSync#BrowserID_.2B_REST
I'd prefer https://wiki.mozilla.org/Services/Sagrada/ServiceClientFlow#Access as a concrete "how does a client actually make these requests" document to link to from the API.
> Specifically, I think we should *not* reference Mozilla specific hostnames or implementation specifics (e.g. token server). We also should not mention discovery here. The AITC protocol begins at "you have an AITC endpoint."
In general, where we've included hostnames it's been for example purposes only, but we can/should use something like example.com to avoid this confusion. I don't think we've mentioned implementation specifics in developer docs, the client flow relies on discovery of fully qualified URLs for endpoints, and should be implementation-agnostic. If there's a specific case where we expose implementation details in the API contract or client calls, please let me know.
>> 2) App Records must never be deleted (remove DELETE from Apps API)
>
> We've agreed that this is a requirement and that the contract is "be careful, once it's gone it's gone."
Followup question: can I delete everything, or only one record at a time? If yes, do we want some sort of header-based hoop to ensure that a "delete all" call isn't just a malformed request? (i.e. DELETE apps/{appid} where appid is an empty string due to sloppy coding/error-checking)
>> 3) Remove the "partial" or
>> "abbreviated" app record options from the API, and only support
>
> We've agreed that we need this feature. Mike correctly pointed out that
>
> (a) we may need to revisit the exact fields included in "abbreviated" depending on when we think we'll use that call.
I'd like to get clarity on this bit ASAP. Can we get an answer locked in by the end of the week? It will matter for getting the data model optimized.
> (b) it will be worth tracking the last addition time and last deletion time in the user account for potential future optimizations.
+1
>> 4) Adopt a metadata+payload format for transport.
>>
>> Right now the format is flat, which means the server has to have
>> specific awareness of the data format
>
> We didn't specifically discuss the payload, but I feel fairly strongly about an API that does JSON natively, without enveloping. See for example:
>
> https://graph.facebook.com/search?q=watermelon&type=post
If we're doing this, I think we need some form of versioning/typing of requests, for forward compatibility if nothing else.
Greg had suggested using content type negotiation for a RESTful differentiating between "full" and "abbreviated" versions of resources. This would allow implicit versioning as we could define a new type in the future if we need or want to return a different format, without requiring an explicit version in the URL.
-- Mike
I've been talking to Tarek/Alexis about defining a base set of sematics for Sagrada apps based on what we have for Sync 2.0. For those who are unfamiliar, http://docs.services.mozilla.com/storage/apis-2.0.html#response-headers is the latest version. We may not include all of these, but most are very generic and can be used broadly.
These headers and codes extend slightly from the base HTTP semantics, but in general are very simple and powerful. We've learned the hard way that not having well-defined and well-tested semantics (especially around backoff intervals!) makes it much more difficult to effectively manage services during traffic spikes and maintenance, so I would like to bake in those semantics from the beginning.
Any concerns?
-- Mike
On 2012-03-06, at 8:18 PM, Ben Adida wrote:
I think there are some details indeed worth discussing, but can we agree
to discuss these in parallel with an implementation?
I'd like to report, at tomorrow's apps meeting, that we've started the
implementation, and next week that we have a working v0.1.
Let's get a repo going, some tests that we can all agree are testing the
right behavior, and some working code.
-Ben
That said, I think it's premature to talk schedule or dates without even a draft implementation plan, but I would expect that plan by EOD tomorrow, at the latest. My hopes match yours for 0.1, but we'll have a much crisper story tomorrow.
-- Mike
I have a couple of questions/clarifications/comments on the AITC spec
that will affect the initial implementation. All based on the spec as
it currently exists here:
https://wiki.mozilla.org/Apps/AITC
* Are the "modificationTime" app field and "modifiedAt" device field
set by the client, or should the server be setting these fields to its
local time with each write to the database?
* Is the timestamp used in the "detailsafter" parameter matched
against the "modificationTime" field from the apps JSON record, or
against an independent timestamp maintained by the server?
* Can we please separate the unrelated tasks of (a) selecting output
formats and (b) filtering by timestamp? Currently the presence of the
"detailsafter" parameter implies both.
Suggestion: use a "newer" query parameter to filter by timestamp,
and the "Accept" header to request either full or abbreviated records.
* Services apps currently provide an "X-Timestamp" header with all
responses, giving the current timestamp on the server. Can we use this
instead of including a "time" field in the JSON? (i.e. instead of
returning {apps: [...], time: <ts>})
* Can we set a maximum length on appids? Currently they are
unbounded, since the id is just an encoding of the origin URL.
Suggestion: appid = b64urlencode(sha1(originUrl))
* Please clarify exactly what UUID format the server is expected to
validate for device ids.
* Existing services apps are using an "X-If-Modified-Since" header
with value in integer milliseconds, rather than the standard
"If-Modified-Since" header with its complex date-parsing logic. Can we
do the same here in the interests of both simplicity and precision?
* Regarding "this could do basic validation of the app document",
what sort of validation is required? Should it error out on
unrecognized fields, or only on known fields with badly-formed values?
Cheers,
Ryan
I believe the server should be setting these based on last PUT.
> * Is the timestamp used in the "detailsafter" parameter matched against
> the "modificationTime" field from the apps JSON record, or against an
> independent timestamp maintained by the server?
I believe those two options should be the same given the above point.
> * Can we please separate the unrelated tasks of (a) selecting output
> formats and (b) filtering by timestamp? Currently the presence of the
> "detailsafter" parameter implies both.
>
> Suggestion: use a "newer" query parameter to filter by timestamp, and
> the "Accept" header to request either full or abbreviated records.
Let's discuss this. Accept header is meant for mime-type, right? Are we
actually defining new mime types?
> * Services apps currently provide an "X-Timestamp" header with all
> responses, giving the current timestamp on the server. Can we use this
> instead of including a "time" field in the JSON? (i.e. instead of
> returning {apps: [...], time: <ts>})
Hmmm. I guess I'm not a fan of the extra headers approach. Thoughts on
why this is better? Happy to be convinced, just worried about lots o'
headers.
> * Can we set a maximum length on appids? Currently they are unbounded,
> since the id is just an encoding of the origin URL.
>
> Suggestion: appid = b64urlencode(sha1(originUrl))
I like this. Let's do that. Feel free to update the spec accordingly.
> * Please clarify exactly what UUID format the server is expected to
> validate for device ids.
Pick one that you think is best, add to the spec. I don't have strong
feelings on this.
> * Existing services apps are using an "X-If-Modified-Since" header with
> value in integer milliseconds, rather than the standard
> "If-Modified-Since" header with its complex date-parsing logic. Can we
> do the same here in the interests of both simplicity and precision?
I like this the least. Do we really need to break with the standard here?
> * Regarding "this could do basic validation of the app document", what
> sort of validation is required? Should it error out on unrecognized
> fields, or only on known fields with badly-formed values?
Presence of important fields. I don't think anything more is required.
-Ben
I'd also settle for a ?full=1 header, if we're not actually returning two different MIME types. Splitting the two is the important part.
> Hmmm. I guess I'm not a fan of the extra headers approach. Thoughts on why this is better? Happy to be convinced, just worried about lots o' headers.
Processing code can be the same for a variety of payload bodies. Can include timestamps in failure responses. Allows processing of these values without JSON-parsing a (potentially large) body. And it's what we do today for the other apps we host and the other clients we support.
> I like this the least. Do we really need to break with the standard here?
Speaking as a client implementer, I-M-S is awful. You now need to round the milliseconds you use everywhere else, compose an appropriate value to send, and post-process the results to exclude things that happened in the wrong part of that 1000ms window (assuming you even get an accurate timestamp for the response contents!).
Uniformly handling milliseconds since epoch saves a lot of effort and avoids bugs in both client and server.
I-M-S is one of those "web browsers are for people" things that shows up as onions in HTTP's varnish. Seconds?! What is this, the 80s?! :D
And by "header" of course I meant "param". Please excuse me, I have a bad case of the dumb.
This all happens in the server, too.
The net effect here is that the client takes the millisecond value it is using as internal representation, does a bunch of date string processing to get a date, sends that over to the server, which has to parse it and turn it back into milliseconds. That's a lot of overhead to net remove some precision from the system.
Aside from making it human readable, dates are a depressingly bad communication format.
Toby
I tend to agree with splitting the two. Ian wanted a single call for
performance on mobile. Ian, if you're available, want to jump in?
I am okay with implementing this as two calls, eventually adding Ian's
optimization. Ian, I hope that's okay?
> Processing code can be the same for a variety of payload bodies. Can
> include timestamps in failure responses. Allows processing of these
> values without JSON-parsing a (potentially large) body. And it's what we
> do today for the other apps we host and the other clients we support.
OK, fair point. So why not the existing HTTP-date header? Again the
second vs. millisecond issue?
Is this really that hard? Date parsers abound, and getting around the
1-second lag time can be done by rounding up for lastModified, and down
for If-Modified. No?
I understand epoch would be easier. But I place a good bit of value on
sticking with existing specs where we can, if the inconvenience isn't
too high.
-Ben
Parsing overhead, inadequate precision, no legacy clients to support.
It's verging on ridiculous to have a computer figure out if it's Tuesday so it can talk to another computer, who'll just need to parse that string back down to a (no longer precise!) long to query a database.
> Is this really that hard? Date parsers abound, and getting around the 1-second lag time can be done by rounding up for lastModified, and down for If-Modified. No?
Nope.
As I mentioned, if I want to process the records modified since a millisecond timestamp, but I can only make a request with second precision, I need to filter the response, I can't trust record counts, the response has to include millisecond timestamps, etc.
And of course with If-Modified-Since you will get either false positives or false negatives. Both are bad.
> I understand epoch would be easier. But I place a good bit of value on sticking with existing specs where we can, if the inconvenience isn't too high.
These are automated systems that intrinsically work in milliseconds. Using HTTP has advantages as a transport protocol, but I see no advantage here in working with these particular HTTP headers -- there's no caching/proxy win, there are no existing deployed clients that use those headers with earlier versions, etc.
Using HTTP-native date headers for automated systems is like chaining power adapters from US->UK->Euro->US. You could do it, but what's the point?
> On 3/13/12 6:34 PM, Richard Newman wrote:
>>> Let's discuss this. Accept header is meant for mime-type, right?
>>> Are we actually defining new mime types?
>>
>> I'd also settle for a ?full=1 header, if we're not actually returning
>> two different MIME types. Splitting the two is the important part.
>
> I tend to agree with splitting the two. Ian wanted a single call for performance on mobile. Ian, if you're available, want to jump in?
>
> I am okay with implementing this as two calls, eventually adding Ian's optimization. Ian, I hope that's okay?
It doesn't need to be two calls, it's two optional params, that can be combined in a single request: ?newer=timestamp&full=1
However, before we do that, let's address a different question (from a different response):
>> * Can we please separate the unrelated tasks of (a) selecting output
>> formats and (b) filtering by timestamp? Currently the presence of the
>> "detailsafter" parameter implies both.
>>
>> Suggestion: use a "newer" query parameter to filter by timestamp, and
>> the "Accept" header to request either full or abbreviated records.
>
> Let's discuss this. Accept header is meant for mime-type, right? Are we actually defining new mime types?
I'm leaning towards yes, as this allows us to add new data formats without changing the protocol/calling convention at all. My understanding of REST best practices is that a URL should point to a resource, not a specific format of that resource. i.e. {endpoint}/apps/{appid} should return a representation of the app based on the Accept header. I'm not a purist, as you know, but this feels like a nice bit of flexibility we may want later, if we don't want to version the API because of data format changes.
We could have the following:
application/x-webapp-full
application/x-webapp-minimal
and then, at a later date, add something like |application/x-webapp-full-2| if there is a need. This also means we could do smaller PUT requests and only upload changed fields by specifying a mime type like application/x-webapp-delta so the server knows to only modify the fields that are present.
-- Mike
>> 3) Remove the "partial" or
>> "abbreviated" app record options from the API, and only support
>
> We've agreed that we need this feature. Mike correctly pointed out that
>
> (a) we may need to revisit the exact fields included in "abbreviated" depending on when we think we'll use that call.
Something I remembered we discussed, but didn't make it into this email, was the idea that dashboards would use one or both calls to show a list of apps. The data formats defined point to apps, but don't contain any sort of descriptive content about the apps.
I think this is likely going to be a common use case (pull down and display all of my apps), but with the current data formats I assume a dashboard author would need to directly fetch data for each and every app they wanted to display on screen, in order to get key information such as app names, and possibly icons/descriptions. If a user has a lot of apps, this would be fairly slow if we're hitting the network dozens of times.
I would suggest that AITC should cache some sort of *non-authoritative* metadata about an app, for the quick display use case. If, on loading an app, that metadata has changed, the AITC record would need to be updated on the server so the next dashboard load would be current. We could definitely tweak/add formats depending on implementation/performance concerns.
Thoughts?
-- Mike
Yes, I agree. This was not a problem in the old format because the whole
manifest was stored on the AITC server. But storing atleast some
non-authoritative metadata would speed things up greatly.
The concern around storing things like icons in the apprecord was that
we were wasting durable storage on information that didn't need to be
durable. So it's probably best for us to narrow down on only the name
and description and skip icons (as icons may also be data URIs, not just
URIs) to lower costs.
-Anant
On 3/13/12 6:34 PM, Richard Newman wrote:I tend to agree with splitting the two. Ian wanted a single call for performance on mobile. Ian, if you're available, want to jump in?
Let's discuss this. Accept header is meant for mime-type, right?
Are we actually defining new mime types?
I'd also settle for a ?full=1 header, if we're not actually returning
two different MIME types. Splitting the two is the important part.
I am okay with implementing this as two calls, eventually adding Ian's optimization. Ian, I hope that's okay?
OK, fair point. So why not the existing HTTP-date header? Again the second vs. millisecond issue?
Processing code can be the same for a variety of payload bodies. Can
include timestamps in failure responses. Allows processing of these
values without JSON-parsing a (potentially large) body. And it's what we
do today for the other apps we host and the other clients we support.
Is this really that hard? Date parsers abound, and getting around the 1-second lag time can be done by rounding up for lastModified, and down for If-Modified. No?
Yes, I agree. This was not a problem in the old format because the whole manifest was stored on the AITC server. But storing atleast some non-authoritative metadata would speed things up greatly.Something I remembered we discussed, but didn't make it into this email, was the idea that dashboards would use one or both calls to show a list of apps. The data formats defined point to apps, but don't contain any sort of descriptive content about the apps.
I think this is likely going to be a common use case (pull down and display all of my apps), but with the current data formats I assume a dashboard author would need to directly fetch data for each and every app they wanted to display on screen, in order to get key information such as app names, and possibly icons/descriptions. If a user has a lot of apps, this would be fairly slow if we're hitting the network dozens of times.
I would suggest that AITC should cache some sort of *non-authoritative* metadata about an app, for the quick display use case. If, on loading an app, that metadata has changed, the AITC record would need to be updated on the server so the next dashboard load would be current. We could definitely tweak/add formats depending on implementation/performance concerns.
Thoughts?
The concern around storing things like icons in the apprecord was that we were wasting durable storage on information that didn't need to be durable. So it's probably best for us to narrow down on only the name and description and skip icons (as icons may also be data URIs, not just URIs) to lower costs.
On Tue, Mar 13, 2012 at 9:13 PM, Ben Adida <bena...@mozilla.com> wrote:On 3/13/12 6:34 PM, Richard Newman wrote:I tend to agree with splitting the two. Ian wanted a single call for performance on mobile. Ian, if you're available, want to jump in?
Let's discuss this. Accept header is meant for mime-type, right?
Are we actually defining new mime types?
I'd also settle for a ?full=1 header, if we're not actually returning
two different MIME types. Splitting the two is the important part.
I am okay with implementing this as two calls, eventually adding Ian's optimization. Ian, I hope that's okay?
I don't care that much, but in addition to mobile I was working under the principle that server protocol to match the client needs, not the other way around, since there's no purpose to the server without the client. The client always needs that information, so it just seems sensible to me. Also with two requests it's a half-optimization – the totally unoptimized versions would be N+1 requests to get N new/updated apps. I don't actually know much about mobile performance, only that I've been told that latency is particularly bad, and so number of requests (and if those requests must be serialized) has a significant affect on speed. But I'm working mostly on hearsay.
OK, I might be confused by the exact proposal at this point. The reason for detailsafter (as a name) is that all records are returned, but only abbreviated records for items before that time. The abbreviated list allows the client to see deletes. I would expect (but maybe this isn't correct) that full=1&newer=<time> would only return records after that time. I don't think the two cases are very different Sync because it uses tombstones, but we aren't proposing tombstones for AITC.
Ian
This is another point to be decided: write conflict detection. Do we
want to use ETags or If-Unmodified-Since? I would prefer to have a
single blessed mechanism rather than supporting both.
One advantage of If-Unmodified-Since is that you only have to remember a
single value (as in "I have all updates as of time XYZ") rather than
keeping the last-seen ETag for each individual app.
(Of course, I'm immediately going to suggest X-If-Unmodified-Since with
a millisecond timestamp instead of the standard date-based header - and
this is a case where that extra precision really does matter)
Cheers,
Ryan
On 14/03/12 10:11, Ian Bicking wrote:This is another point to be decided: write conflict detection. Do we want to use ETags or If-Unmodified-Since? I would prefer to have a single blessed mechanism rather than supporting both.
> ETag works well for most parts of this (and
> a little better for preconditions) but isn't so good for
> detailsafter/newer, where we want to think about a time sequence
> and not just identity.
One advantage of If-Unmodified-Since is that you only have to remember a single value (as in "I have all updates as of time XYZ") rather than keeping the last-seen ETag for each individual app.
Ian: I believe we nixed that together, you and me, in the final review,
because as Toby notes, that's not what the spec says anymore (and I
didn't change it after our discussion.)
So I think it is two calls, one to list abbreviated apps, and one to
list details of apps after a certain date.
I'm totally okay with splitting the one GET parameter into two.
Indeed. I've been trying to drum up support for this semantic using "if
unmodified since the beginning of time" like so:
PUT /app/XXX
X-If-Unmodified-Since: 0
But it has been met with (not undeserved) skepticism amongst the
services folk so far.
Aha! Now I think the talk of "two calls" makes sense to me. You mean
one call to get the list of all the apps, so that you can detect
deletions. Then a second call to get the details of all the apps you
don't have yet.
Is that accurate?
Ryan
On 14/03/12 11:08, Ben Adida wrote:Aha! Now I think the talk of "two calls" makes sense to me. You mean one call to get the list of all the apps, so that you can detect deletions. Then a second call to get the details of all the apps you don't have yet.
On 3/14/12 10:48 AM, Toby Elliott wrote:
OK, I might be confused by the exact proposal at this point. The
reason for detailsafter (as a name) is that /all/ records are
returned, but only abbreviated records for items before that time.
Ian: I believe we nixed that together, you and me, in the final review,
because as Toby notes, that's not what the spec says anymore (and I
didn't change it after our discussion.)
So I think it is two calls, one to list abbreviated apps, and one to
list details of apps after a certain date.
Is that accurate?
> Yes, I agree. This was not a problem in the old format because the whole manifest was stored on the AITC server. But storing atleast some non-authoritative metadata would speed things up greatly.
>
> The concern around storing things like icons in the apprecord was that we were wasting durable storage on information that didn't need to be durable. So it's probably best for us to narrow down on only the name and description and skip icons (as icons may also be data URIs, not just URIs) to lower costs.
This brings up a good question about size constraints. Sync has a max record size of 256k. This is probably excessive for this use case, but we may/will want to adopt a max size per record for cost control/abuse prevention reasons. Given that the data's fairly minimal, but we want to enable some degree of flexibility, how does 8k sound?
-- Mike
Following on from this, Device records timestamps are currently defined
as HTTP Date strings:
addedAt: "2012-02-28 12:23:35Z",
modifiedAt: "2012-03-05 13:23:34Z",
Can we make them integer milliseconds for consistency with the
similarly-named App record fields?
Ryan
While we're on the topic, this is as good a time as any for my annual
"don't store numerics as strings" rant.
My i7-2600K using CPython 2.7.2 saturates one full core in
time.strptime() for the above date format at a rate of ~71,000/s. This
may sound like a lot, but 71k * 20 bytes = ~1.4MB/s. That's a lot slower
than a network or disk can deliver data! Granted, there is non-date
string data in the payload, so the example is a bit contrived. But, the
next time you wonder why your big data analysis tool is running so slow,
you may want to start by pointing a finger at string parsing (especially
in higher-level languages).
If machines are talking to machines, get as close to binary as you can.
For dates over text-based protocols like HTTP headers or JSON, that
means reducing the number of conversions to 1. i.e. store time units
since an epoch.
(Yes, this specific case is probably premature optimization. But, good
engineers shouldn't waste cycles so carelessly.)
Greg
I had a good chat with RyanK over lunch today, got to dig into some of
the issues. In the interest of rapidly moving forward, I'd like to
conclude the following:
- two API calls: one for app list, one for items modified since a date.
Agreed on the separation of two parameters, full=1&since=<>.
- accept headers: no, we'd have to define new mime types, and that seems
like overkill plus somewhat atypical when the list of *fields* of the
returned data changes, rather than just the representation format (e.g.
XML vs. JSON.)
- If-Modified-Since: I would rather we stick with IMS rather than X-IMS.
I hear arguments about the standard being weak, but I value the
standards-compliance more than the millisecond resolution. Plus, no
one's writing date parsers, right, it's all standard libraries? So this
doesn't seem that problematic. I am willing to give up on this if need
be, but please take a moment to consider the value of standards compliance.
- conflict detection: yes, let's do it, and let's do it with RyanK's
proposal If-Unmodified-Since. It's the most consistent /
easy-to-understand approach. (Of course if we use X-IMS, then this
becomes X-IUS.)
- dates in JSON data structures: have them be HTTP dates if we use IMS,
otherwise epochs if X-IMS.
If you can live with the above and you don't think it's going to be too
much trouble, then let's go with that. If you really feel strongly and
differently after thinking about it for 10 minutes, then send some thoughts.
-Ben
Extra points if these -- especially the former -- return 304. I imagine that plus keepalive would bring the cost of the additional request to near zero.
> - If-Modified-Since: I would rather we stick with IMS rather than X-IMS. I hear arguments about the standard being weak, but I value the standards-compliance more than the millisecond resolution. Plus, no one's writing date parsers, right, it's all standard libraries? So this doesn't seem that problematic. I am willing to give up on this if need be, but please take a moment to consider the value of standards compliance.
Could you help me understand why you think compliance with the HTTP spec is more important than alignment with the domain (which uses Unix timestamps exclusively), or alignment with our other services (which either already use, or are in the process of switching to, millisecond headers)?
(That is: how is AITC different enough from all of our other machine-consumed services to warrant using HTTP dates rather than 'standard' milliseconds?)
In the absence of some reason to reuse this part of the HTTP spec, my vote continues to be for using milliseconds everywhere. I'm a (somewhat recovered) ReST-head and semweb nerd, so I'm not normally one to kick arcane specs in the teeth, but I can only see costs and bugs lurking in this decision, and I haven't yet heard any reason to choose standards-compliance in the face of those costs.
If I find myself at 2am writing a date parser in Objective C, what value does standards-compliance bring with which I can console myself?
> - dates in JSON data structures: have them be HTTP dates if we use IMS, otherwise epochs if X-IMS.
I only agree with this matching up if the exact same strings are used in the body as in the headers, as the HTTP spec advises for dates (and thus we're OK with taking 31/62 bytes to express a time in JSON). If we're going to be serializing from a long instead, then this seems wasteful and expensive, and we should just use seconds or milliseconds since epoch.
Ahah, so this came in after I wrote (but before I clicked "send" on) my
thoughts on moving forward on the API. I didn't mean to ignore your note.
I think this is a bit of a premature optimization as you mention (also,
70K/s sounds quite slow, I suspect we can do a lot better.)
My take is that abiding by existing standards when they fit the bill is
almost always a good idea. If we were discussing a brand new standard,
there is no doubt that your proposed X-IMS solution would be
significantly better.
rnewman added:
> Could you help me understand why you think compliance with the HTTP
> spec is more important than alignment with the domain (which uses
> Unix timestamps exclusively), or alignment with our other services
> (which either already use, or are in the process of switching to,
> millisecond headers)?
standards compliance, and thus interoperability with existing
toolchains, existing expectations, readability by other developers, etc.
is, in my opinion, an idea that should override other small-to-medium
sized advantages. There are reasons to deviate of course, and I'm open
to this being one of them.
But my default is to use the HTTP standard and deviate when there's a
very good reason. I still don't see one.
> In the absence of some reason to reuse this part of the HTTP spec, my
> vote continues to be for using milliseconds everywhere.
Right, and this is our key difference of opinion: what is our default
design? I think deviation from standards needs justification, not the
other way around.
> costs and bugs lurking in this decision,
Serious ones? I can't see any serious issues from per-second
granularity, and I'm not all that swayed by mismatch with existing
implementations since I think that's the tail wagging the dog.
So, again, on this issue, I'm still willing to be swayed, but I do want
to make the point that we should default to standards, and deviate only
with good reason.
> If I find myself at 2am writing a date parser in Objective C, what
> value does standards-compliance bring with which I can console
> myself?
If we actually have to write gnarly date parsers, that would be a good
argument. But do we have to? Isn't this standard library stuff?
-Ben
... we could, by not handling those strings :P
> My take is that abiding by existing standards when they fit the bill is almost always a good idea. If we were discussing a brand new standard, there is no doubt that your proposed X-IMS solution would be significantly better.
IMO the HTTP headers *don't* fit the bill, because they lack precision and use unnecessarily expensive human-centric values. That's my point: we're incurring a pile of paper cuts in order to use a header that kinda-sorta does the same thing that we're trying to do, but in a very 90s way.
> standards compliance, and thus interoperability with existing toolchains, existing expectations, readability by other developers, etc. is, in my opinion, an idea that should override other small-to-medium sized advantages. There are reasons to deviate of course, and I'm open to this being one of them.
I used to think like that.
At some point during my semweb career I realized that (for example) having fifty implementations all use epoch internally, and serialize to a big imprecise human-readable string for interchange, was crazy talk. Next up we'll be switching Firefox's bookmark store to use RDF/XML and XSD -- after all, those parsers are really widespread!
(Apologies for snark; couldn't resist poking fun at myself. I have spent more time than I care to think about writing XSD and RDF parsers.)
>> In the absence of some reason to reuse this part of the HTTP spec, my
>> vote continues to be for using milliseconds everywhere.
>
> Right, and this is our key difference of opinion: what is our default design? I think deviation from standards needs justification, not the other way around.
I guess we'll just have to agree to disagree on this. I don't see a problem with *extending* the HTTP spec when it's not a good match for requirements and convention, rather than warping the design and implementation to match the spec, and I would only be persuaded that it's worthwhile to warp in that way if there's some concrete benefit beyond "don't deviate".
(It's worth bearing in mind that this is a legitimate, spec-supported extension, not a deviation. A deviation would be cramming milliseconds into HTTP date headers.)
Do we have any evidence of a consumer of this spec or this software that would have an easier time understanding, implementing, auditing, or otherwise supporting it if we use HTTP dates instead of milliseconds?
For example, are there plans to use standard HTTP caches that understand I-U-S? Client libraries that forbid adding non-standard headers? Languages that can parse date strings, but don't work in epoch internally? Humans that need to read these dates and make decisions?
Those would sway me.
But as far as I can see, using the existing HTTP headers will have a zero-or-negative efficiency and complexity impact on *all* clients and servers, current and future; the only things that'll be pleased are the RFC Gods. "Haha! Another group of puny mortals has made a worthy sacrifice of a billion CPU cycles!".
>> costs and bugs lurking in this decision,
>
> Serious ones? I can't see any serious issues from per-second granularity, and I'm not all that swayed by mismatch with existing implementations since I think that's the tail wagging the dog.
Not serious (as far as I know). This part of my argument really just comes down to friction/papercuts: using dates rather than timestamps just makes everything a little more difficult and expensive.
Just off the top of my head:
* Minor bugs around failure to multiply or divide by 1000 (I have seen these first hand! Sync 1.1 works in decimal seconds. Urgh.)
* Potential race conditions that are exacerbated one thousandfold
* Obligation to do filtering of any records downloaded by date limiting, if clients care about millisecond resolution
* At least a ten times throughput reduction in parsing, let alone GC pauses etc. (based on informal timing in Javascript; for illustration only)
* Space increases for storing and transmitting long strings
* Requirement to parse on the server (so it can index on a date column)
* Additional server logic to handle the HTTP headers, rather than our standard headers
I'm sure there are more, but probably little point me going on.
> If we actually have to write gnarly date parsers, that would be a good argument. But do we have to? Isn't this standard library stuff?
I doubt that correctly parsing and validating "Sun, 06 Nov 1994 08:49:37 GMT" comes before parsing and validating "784111777000" in a library implementation. This is a statistical argument, rather than an exhaustive examination of available programming languages: I would rather take the safe bet.
We *know* that every language can parse the latter with blazing speed and perfect precision, and convey more information in less space by doing so.
It's just another potential for more work and more errors, which I feel should be balanced against some concrete benefit.
I've done a good bit of RDF myself, so I fully appreciate the snark :)
I don't think we have enough to move away from the standard but ... I
think your alternative approach is not terrible, and I haven't been very
convincing, so I give in on this point.
To summarize, then:
=====
- two API calls as currently spec'ed: one for app list, one for items
modified since a date. Agreed on the separation of two parameters,
full=1&since=<>.
- no accept headers.
- Use X-If-Modified-Since as milliseconds since epoch. API calls return
304 when appropriate, just like standard IMS. This means we also need
X-Timestamp or the like (what did you call it?).
- conflict detection: yes, using RyanK's proposal X-If-Unmodified-Since
with a value of 0. No etags.
- dates in JSON data structures: milliseconds-since-epoch.
=====
I'll update the wiki accordingly, and I think we've resolved all
currently outstanding issues?
-Ben
I think this is done, in part thanks to Ryan Kelly who already went in
and fixed some things :)
Some tweaks to make field naming consistent, too.
Please look things over:
https://wiki.mozilla.org/Apps/AITC
I think the main questions to answer here, afaict:
* Is the first-run experience good enough, especially on web dashboards (and _especially_ on mobile) if we have to make dozens of requests?
* What is the expected client mix? How many clients will be rich clients with solid caching, how many clients will be web-based with minimal persistence?
Based on the discussion, it seems worthwhile to include the name and the icon URL in the app record as a non-authoritative cache, so a dashboard/new client only needs to make a couple of requests to have a human-readable list (albeit without icons). I don't think we need to create a minimal manifest format, this is just a matter of clients being aware enough to detect absence/staleness in the app record when loading the corresponding manifest, and updating the cache. The goal I think I'm advocating here is to avoid having to fetch/parse manifests in order to display usable UI. If we can get it in a single request, that feels ideal to me, but it's mostly an Apps product call.
-- Mike
_______________________________________________
I think you're right to be concerned about this: it's likely to be the
first of our needed optimizations. I would still suggest holding off on
it for now, because the only time this is a real problem is when you're
setting up a new device, which should be a rare occasion. Complicating
the client logic this soon may be a premature optimization.
I don't want to downplay this concern too much: this is probably our
first optimization. But in the spirit of iteration and conservative
design, I think we can start without it, and see how badly this bites us
in 3 weeks once we've got some code going. The path to caching some
manifest data is fairly clear, if we need to do it.
-Ben
That said, I'm pretty sure we need to have a cutoff for changes before launch to control risk. I'm going to propose April 30th as a fairly hard stop for changes to the API and data formats until after the Marketplace launch, unless there is a critical product experience issue we have to fix before launch. Once we get through the launch, we will go back to being iterative and flexible, but there will be so much focus on initial launch that I want to ensure we are very deliberate about what we have in production on the day.
Anyone concerned with/opposed to that? If not, I'll schedule something about a week before the cutoff to checkpoint and make sure all issues have been flagged by then.
-- Mike
So, Ben's latest revision doesn't address this. Do we want to address this?
I think the main questions to answer here, afaict:
* Is the first-run experience good enough, especially on web dashboards (and _especially_ on mobile) if we have to make dozens of requests?
* What is the expected client mix? How many clients will be rich clients with solid caching, how many clients will be web-based with minimal persistence?
This looks good to me.
One observation: is it worth clarifying what undocumented verb/path combinations do, and whether the operations in this doc are an exhaustive list?
E.g., DELETE {endpoint}/devices, or DELETE {endpoint}/apps. Method Not Allowed, or does this do what I would expect? It would be acceptable to me to point to a suitable Services doc saying "falls back to this logic".
I think that's right.
> That said, I'm pretty sure we need to have a cutoff for changes
> before launch to control risk.
You're right, we should absolutely set a date.
> I'm going to propose April 30th as a
> fairly hard stop for changes to the API and data formats
Since beta is 4/26, I think that's a good idea.
-Ben
FWIW, I have an in-process "technical spec" for this in the style of our
detailed SyncStorage document:
https://github.com/mozilla-services/docs/blob/rfk/aitc-server-spec/source/aitc/apis-1.0.rst
Cheers,
Ryan