At the meetup last week Will Norris and I began an effort to refactor the AtomActivity spec for clarity. It's clear that this spec has evolved piecemeal from its first draft and has become crufty and confusing along the way, so this is an attempt to explain the concepts behind activity streams and the Atom activity extensions in a new way which will hopefully be clearer to the uninitiated.
Although I understand the plan to be that we ship with the existing spec due to a desire to rush this out before SXSW I hope we will follow-up with this new version, which should be functionally equivalent and compatible, not long afterward.
We're now working in a "refactor" branch in the new atomactivity repository in the "activitystreams" github account. I've published a HTML version of this document here so that folks can see what we're working on:
You will note that as of this writing it's not complete, but the outline and what exists so far of the content show the general approach: first we define in an abstract sense what an activity is and what an object is, and then go on to define how to extract data from an Atom feed to create instances of those abstract data types.
The goal is to make it clearer to a newcomer what this spec considers to be the data model for an activity, before delving in to the guts of how to serialize it. The section on the serialization is written in more of an imperative style so that it's easier to follow the intended processing steps for a consumer.
This rewrite also features much more concrete references to specific sections of the Atom specification (RFC4287) rather than referencing vague features of that specification as a whole, and will hopefully also feature fewer ambiguities regarding the intended processing model.
At present I'm not planning to refactor the activity schema specification, though if in future a JSON serialization is incorporated into our ecosystem further refactoring may be desirable there, since right now that specification is inextricably tied to Atom.
At first glance, I like this. (It's also what I'm kind of forced into when trying to talk about magic signatures which have XML and JSON serialization; the abstract object model comes first, then the serializations do a lot of cross-references, and try to keep the naming conventions somewhat aligned wherever possible.)
I have a query about the cardinality of Atom entries and activities which this new spec has thrown into high relief (that's a good result!) and I'll take that query to a separate thread.
On Thu, Mar 11, 2010 at 1:03 PM, Martin Atkins <m...@degeneration.co.uk>wrote:
> At the meetup last week Will Norris and I began an effort to refactor the > AtomActivity spec for clarity. It's clear that this spec has evolved > piecemeal from its first draft and has become crufty and confusing along the > way, so this is an attempt to explain the concepts behind activity streams > and the Atom activity extensions in a new way which will hopefully be > clearer to the uninitiated.
> Although I understand the plan to be that we ship with the existing spec > due to a desire to rush this out before SXSW I hope we will follow-up with > this new version, which should be functionally equivalent and compatible, > not long afterward.
> We're now working in a "refactor" branch in the new atomactivity repository > in the "activitystreams" github account. I've published a HTML version of > this document here so that folks can see what we're working on:
> You will note that as of this writing it's not complete, but the outline > and what exists so far of the content show the general approach: first we > define in an abstract sense what an activity is and what an object is, and > then go on to define how to extract data from an Atom feed to create > instances of those abstract data types.
> The goal is to make it clearer to a newcomer what this spec considers to be > the data model for an activity, before delving in to the guts of how to > serialize it. The section on the serialization is written in more of an > imperative style so that it's easier to follow the intended processing steps > for a consumer.
> This rewrite also features much more concrete references to specific > sections of the Atom specification (RFC4287) rather than referencing vague > features of that specification as a whole, and will hopefully also feature > fewer ambiguities regarding the intended processing model.
> At present I'm not planning to refactor the activity schema specification, > though if in future a JSON serialization is incorporated into our ecosystem > further refactoring may be desirable there, since right now that > specification is inextricably tied to Atom.
> -- > You received this message because you are subscribed to the Google Groups > "Activity Streams" group. > To post to this group, send email to activity-streams@googlegroups.com. > To unsubscribe from this group, send email to > activity-streams+unsubscribe@googlegroups.com<activity-streams%2Bunsubscrib e@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/activity-streams?hl=en.
On Thu, Mar 11, 2010 at 1:03 PM, Martin Atkins <m...@degeneration.co.uk>wrote:
> The goal is to make it clearer to a newcomer what this spec considers to be > the data model for an activity, before delving in to the guts of how to > serialize it. The section on the serialization is written in more of an > imperative style so that it's easier to follow the intended processing steps > for a consumer.
I personally find the imperative / prescriptive style a bit more difficult to read. I'm also not sure that writing it in terms of processing is the best overall approach. In some sense it may be helpful for consumers, but it makes it a particularly awkward read for publishers. Instead, if it's presented simply as a mapping between AS concepts and Atom XML, it's relatively simple to conceptually map in either direction. If the desire is to have normative text in there to try and ensure interop between implementations, we can certainly find another place to add that... a 'conformance' section or something.
> I personally find the imperative / prescriptive style a bit more > difficult to read. I'm also not sure that writing it in terms of > processing is the best overall approach. In some sense it may be > helpful for consumers, but it makes it a particularly awkward read for > publishers. Instead, if it's presented simply as a mapping between AS > concepts and Atom XML, it's relatively simple to conceptually map in > either direction. If the desire is to have normative text in there to > try and ensure interop between implementations, we can certainly find > another place to add that... a 'conformance' section or something.
My goal was to write it in roughly the expected processing order but to write it in such a way that it can also be read out of order to discover how to publish.
Whether I've succeeded in that goal is of course debatable.
>> I personally find the imperative / prescriptive style a bit more >> difficult to read. I'm also not sure that writing it in terms of >> processing is the best overall approach. In some sense it may be >> helpful for consumers, but it makes it a particularly awkward read for >> publishers. Instead, if it's presented simply as a mapping between AS >> concepts and Atom XML, it's relatively simple to conceptually map in >> either direction. If the desire is to have normative text in there to >> try and ensure interop between implementations, we can certainly find >> another place to add that... a 'conformance' section or something.
> My goal was to write it in roughly the expected processing order but to > write it in such a way that it can also be read out of order to discover how > to publish.
> Whether I've succeeded in that goal is of course debatable.
It occurred to me yesterday that the schema spec is also written in a more descriptive style, mapping FROM the conceptual TO the Atom representation. It would certainly be good to be consistent between the two, whichever approach is taken.
Could you guys give us an update as to where you are with this rewrite and let us know what else needs to be done?
I would like to target end of April as the final deadline for 1.0 — so knowing how far along you are with the rewrite, and what help you need, would be very useful.
On Wed, Mar 17, 2010 at 9:50 AM, Will Norris <w...@willnorris.com> wrote: > On Thu, Mar 11, 2010 at 4:19 PM, Martin Atkins <m...@degeneration.co.uk>wrote:
>> On 03/11/2010 03:03 PM, Will Norris wrote:
>>> I personally find the imperative / prescriptive style a bit more >>> difficult to read. I'm also not sure that writing it in terms of >>> processing is the best overall approach. In some sense it may be >>> helpful for consumers, but it makes it a particularly awkward read for >>> publishers. Instead, if it's presented simply as a mapping between AS >>> concepts and Atom XML, it's relatively simple to conceptually map in >>> either direction. If the desire is to have normative text in there to >>> try and ensure interop between implementations, we can certainly find >>> another place to add that... a 'conformance' section or something.
>> My goal was to write it in roughly the expected processing order but to >> write it in such a way that it can also be read out of order to discover how >> to publish.
>> Whether I've succeeded in that goal is of course debatable.
> It occurred to me yesterday that the schema spec is also written in a more > descriptive style, mapping FROM the conceptual TO the Atom representation. > It would certainly be good to be consistent between the two, whichever > approach is taken.
> -will
> -- > You received this message because you are subscribed to the Google Groups > "Activity Streams" group. > To post to this group, send email to activity-streams@googlegroups.com. > To unsubscribe from this group, send email to > activity-streams+unsubscribe@googlegroups.com<activity-streams%2Bunsubscrib e@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/activity-streams?hl=en.
Chris Messina wrote: > Could you guys give us an update as to where you are with this rewrite > and let us know what else needs to be done?
> I would like to target end of April as the final deadline for 1.0 so > knowing how far along you are with the rewrite, and what help you need, > would be very useful.
Yesterday Monica made me aware of a misunderstanding where I thought you were going to unveil the existing draft as the first release at SXSW and so this rewrite was not a pressing concern. However, I now understand that you're waiting for the rewrite to be completed.
I don't expect the remaining object representation sections to take very long, since they're basically the same as the "object as atom:entry" section with a few element names twiddled.
I also need to bring in the requirements around RSS parsing which should hopefully be easier to do now that there's a cleaner separation between the abstract data model and the Atom serialization in the prose we already have. I plan to write this in the new spec as a mapping between RSS and the abstract data model rather than a mapping between RSS and Atom as it was before.
(I also need to get RSS support in the Activity Streams Tester app I wrote, but that's a separate issue entirely.)
On Sun, Mar 21, 2010 at 8:27 PM, Martin Atkins <m...@degeneration.co.uk>wrote:
> Chris Messina wrote:
>> Could you guys give us an update as to where you are with this rewrite and >> let us know what else needs to be done?
>> I would like to target end of April as the final deadline for 1.0 — so >> knowing how far along you are with the rewrite, and what help you need, >> would be very useful.
> Yesterday Monica made me aware of a misunderstanding where I thought you > were going to unveil the existing draft as the first release at SXSW and so > this rewrite was not a pressing concern. However, I now understand that > you're waiting for the rewrite to be completed.
Yes, sorry for not making that clear. Since we want the 1.0 to be as good as it can be (within some reasonable time constraints) I'd rather wait for us to get the rewrite done and launch with that, rather than ending up in the OAuth scenario where the 1.0 comes out, no one understands it, and we have to rely on an "editor's cut" to explain to new/uninitiated audiences what is that we're trying to accomplish.
So, I was able to pitch ActivityStreams as a concept at SXSW, and rather than announce 1.0, we get until the end of next month to get'er done.
Please keep us posted on your progress — and let us know what document we should be reviewing and what kind of feedback you and Will need!
"Although there is not a one-to-one mapping between an Atom entry and an
> activity, consumers SHOULD retain the value of the atom:id element from > which each activity was produced such that these activities can be > identified should an entry be referred to by another Atom feature which is > not aware of the extensions defined in this specification. This id is not > unique across all activities and has no semantic singificance for Activity > Streams processing."
It's very hard to read this sentence, perhaps breaking it up would help. But I believe it's saying that, as a side effect of the potential one-to-many mapping between an Atom entry and activities, the atom:id is not guaranteed to be unique and should not be relied upon.
This unfortunately breaks Atom semantics, especially when you start to move into the AtomPub case (read/write editing of activities). Of course in that case you almost certainly have an activity-aware client but it makes life more difficult. It also makes it hard to mix activities and non-activities in the same library or feed.
De-duping is also affected. I'm not sure what semantics I could rely on for detecting loops in feed syndication, for example (a case where you may well have mixed activities and non-activities).
If there's a good reason for all this then it'd be okay -- it's not a huge deal -- but it seems like this is not really necessary, unless I'm missing something: If you define (at the abstract level, in section 3, probably as an additional section) that an activity can also be an "aggregate activity" that has its own synthesized id, then you have a 1:1 mapping to entries again. You also define the semantics for aggregate activities to be the same for JSON and Atom (different serializations, of course).
The synthesized id for the aggregate activity is useful for (a) detecting simple loops, deletions, and updates; (b) editing and deleting, probably programmatically as these are machine-generated.
The Salmon spec assumes that the atom:id can be used for all of these purposes, especially if the entry is signed so you are protected from spoofing attacks (as is the case with Salmon).
-John
On Thu, Mar 11, 2010 at 2:03 PM, Martin Atkins <m...@degeneration.co.uk>wrote:
> At the meetup last week Will Norris and I began an effort to refactor the > AtomActivity spec for clarity. It's clear that this spec has evolved > piecemeal from its first draft and has become crufty and confusing along the > way, so this is an attempt to explain the concepts behind activity streams > and the Atom activity extensions in a new way which will hopefully be > clearer to the uninitiated.
> Although I understand the plan to be that we ship with the existing spec > due to a desire to rush this out before SXSW I hope we will follow-up with > this new version, which should be functionally equivalent and compatible, > not long afterward.
> We're now working in a "refactor" branch in the new atomactivity repository > in the "activitystreams" github account. I've published a HTML version of > this document here so that folks can see what we're working on:
> You will note that as of this writing it's not complete, but the outline > and what exists so far of the content show the general approach: first we > define in an abstract sense what an activity is and what an object is, and > then go on to define how to extract data from an Atom feed to create > instances of those abstract data types.
> The goal is to make it clearer to a newcomer what this spec considers to be > the data model for an activity, before delving in to the guts of how to > serialize it. The section on the serialization is written in more of an > imperative style so that it's easier to follow the intended processing steps > for a consumer.
> This rewrite also features much more concrete references to specific > sections of the Atom specification (RFC4287) rather than referencing vague > features of that specification as a whole, and will hopefully also feature > fewer ambiguities regarding the intended processing model.
> At present I'm not planning to refactor the activity schema specification, > though if in future a JSON serialization is incorporated into our ecosystem > further refactoring may be desirable there, since right now that > specification is inextricably tied to Atom.
> -- > You received this message because you are subscribed to the Google Groups > "Activity Streams" group. > To post to this group, send email to activity-streams@googlegroups.com. > To unsubscribe from this group, send email to > activity-streams+unsubscribe@googlegroups.com<activity-streams%2Bunsubscrib e@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/activity-streams?hl=en.
Looking at the Magic Signatures spec[1], I ended up doing the same thing that Martin did (abstract model, then serialization with serialization-thing -> abstract thing) purely because then the serialization section read naturally, the way you'd write it independently, except that you have a common abstract model to fall back on.
You do have to pick one or the other way to do it though, I agree that the model should be the same across the two documents...
On Wed, Mar 17, 2010 at 9:50 AM, Will Norris <w...@willnorris.com> wrote: > On Thu, Mar 11, 2010 at 4:19 PM, Martin Atkins <m...@degeneration.co.uk>wrote:
>> On 03/11/2010 03:03 PM, Will Norris wrote:
>>> I personally find the imperative / prescriptive style a bit more >>> difficult to read. I'm also not sure that writing it in terms of >>> processing is the best overall approach. In some sense it may be >>> helpful for consumers, but it makes it a particularly awkward read for >>> publishers. Instead, if it's presented simply as a mapping between AS >>> concepts and Atom XML, it's relatively simple to conceptually map in >>> either direction. If the desire is to have normative text in there to >>> try and ensure interop between implementations, we can certainly find >>> another place to add that... a 'conformance' section or something.
>> My goal was to write it in roughly the expected processing order but to >> write it in such a way that it can also be read out of order to discover how >> to publish.
>> Whether I've succeeded in that goal is of course debatable.
> It occurred to me yesterday that the schema spec is also written in a more > descriptive style, mapping FROM the conceptual TO the Atom representation. > It would certainly be good to be consistent between the two, whichever > approach is taken.
> -will
> -- > You received this message because you are subscribed to the Google Groups > "Activity Streams" group. > To post to this group, send email to activity-streams@googlegroups.com. > To unsubscribe from this group, send email to > activity-streams+unsubscribe@googlegroups.com<activity-streams%2Bunsubscrib e@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/activity-streams?hl=en.
> "Although there is not a one-to-one mapping between an Atom entry > and an activity, consumers SHOULD retain the value of the atom:id > element from which each activity was produced such that these > activities can be identified should an entry be referred to by > another Atom feature which is not aware of the extensions defined in > this specification. This id is not unique across all activities and > has no semantic singificance for Activity Streams processing."
> It's very hard to read this sentence, perhaps breaking it up would help. > But I believe it's saying that, as a side effect of the potential > one-to-many mapping between an Atom entry and activities, the atom:id is > not guaranteed to be unique and should not be relied upon.
> This unfortunately breaks Atom semantics, especially when you start to > move into the AtomPub case (read/write editing of activities). Of > course in that case you almost certainly have an activity-aware client > but it makes life more difficult. It also makes it hard to mix > activities and non-activities in the same library or feed.
Right. It's trying to say that the atom:id is for the entry itself and not for the activity. So by publishing multiple activities in one entry you lose the ability to address them independently when using technologies like tombstones... tombstones will only be able to "delete" the set of activities as one atomic unit.
This paragraph means to say that consumers SHOULD keep a record of the atom:id of the entry that the activity came from, but that this is NOT the id of the activity itself.
> De-duping is also affected. I'm not sure what semantics I could rely on > for detecting loops in feed syndication, for example (a case where you > may well have mixed activities and non-activities).
There's no such thing as a non-activity. All Atom entries represent at least one activity, though that activity may of course be posting an object of an undefined type.
In my toy implementation I used a tuple of (verbs, object id, actor id, target id, time) as the activity's natural "primary key", but of course this is not resiliant to passing through services which do not preserve the object ids. I'm not convinced that services which cannot preserve the object ids would be able to preserve the activity ids either, though. I think in practice deduping needs to be done with heuristics on the content of objects in order to get good results.
> If there's a good reason for all this then it'd be okay -- it's not a > huge deal -- but it seems like this is not really necessary, unless I'm > missing something: If you define (at the abstract level, in section 3, > probably as an additional section) that an activity can also be an > "aggregate activity" that has its own synthesized id, then you have a > 1:1 mapping to entries again. You also define the semantics for > aggregate activities to be the same for JSON and Atom (different > serializations, of course).
> The synthesized id for the aggregate activity is useful for (a) > detecting simple loops, deletions, and updates; (b) editing and > deleting, probably programmatically as these are machine-generated.
> The Salmon spec assumes that the atom:id can be used for all of these > purposes, especially if the entry is signed so you are protected from > spoofing attacks (as is the case with Salmon).
The ability to represent multiple activities as a single entry is intended for use in feeds that are primarily aimed at non-activity-aware consumption but wish to add some annotations to get nice results in an activity-aware reader. In practice very few systems have published feeds representing several activities, so we could decide to remove this ability at the expense of being able to activity-retrofit feeds that do.
Of course, we'd still need to define some different behavior for the implied activity case, since right now the atom:id in there is defined to be the id of the object itself and not the id of the activity.