Refactoring the AtomActivity spec

Martin Atkins

unread,

Mar 11, 2010, 4:03:42 PM3/11/10

to activity...@googlegroups.com

Hi all,

At the meetup last week Will Norris and I began an effort to refactor
the AtomActivity spec for clarity. It's clear that this spec has evolved
piecemeal from its first draft and has become crufty and confusing along
the way, so this is an attempt to explain the concepts behind activity
streams and the Atom activity extensions in a new way which will
hopefully be clearer to the uninitiated.

Although I understand the plan to be that we ship with the existing spec
due to a desire to rush this out before SXSW I hope we will follow-up
with this new version, which should be functionally equivalent and
compatible, not long afterward.

We're now working in a "refactor" branch in the new atomactivity
repository in the "activitystreams" github account. I've published a
HTML version of this document here so that folks can see what we're
working on:

http://martin.atkins.me.uk/specs/activitystreams/refactored-atomactivity

You will note that as of this writing it's not complete, but the outline
and what exists so far of the content show the general approach: first
we define in an abstract sense what an activity is and what an object
is, and then go on to define how to extract data from an Atom feed to
create instances of those abstract data types.

The goal is to make it clearer to a newcomer what this spec considers to
be the data model for an activity, before delving in to the guts of how
to serialize it. The section on the serialization is written in more of
an imperative style so that it's easier to follow the intended
processing steps for a consumer.

This rewrite also features much more concrete references to specific
sections of the Atom specification (RFC4287) rather than referencing
vague features of that specification as a whole, and will hopefully also
feature fewer ambiguities regarding the intended processing model.

At present I'm not planning to refactor the activity schema
specification, though if in future a JSON serialization is incorporated
into our ecosystem further refactoring may be desirable there, since
right now that specification is inextricably tied to Atom.

John Panzer

unread,

Mar 11, 2010, 4:22:48 PM3/11/10

to activity...@googlegroups.com

At first glance, I like this. (It's also what I'm kind of forced into when trying to talk about magic signatures which have XML and JSON serialization; the abstract object model comes first, then the serializations do a lot of cross-references, and try to keep the naming conventions somewhat aligned wherever possible.)

I have a query about the cardinality of Atom entries and activities which this new spec has thrown into high relief (that's a good result!) and I'll take that query to a separate thread.

--
You received this message because you are subscribed to the Google Groups "Activity Streams" group.
To post to this group, send email to activity...@googlegroups.com.
To unsubscribe from this group, send email to activity-strea...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/activity-streams?hl=en.

Will Norris

unread,

Mar 11, 2010, 6:03:12 PM3/11/10

to activity...@googlegroups.com

On Thu, Mar 11, 2010 at 1:03 PM, Martin Atkins <ma...@degeneration.co.uk> wrote:

The goal is to make it clearer to a newcomer what this spec considers to be the data model for an activity, before delving in to the guts of how to serialize it. The section on the serialization is written in more of an imperative style so that it's easier to follow the intended processing steps for a consumer.

I personally find the imperative / prescriptive style a bit more difficult to read. I'm also not sure that writing it in terms of processing is the best overall approach. In some sense it may be helpful for consumers, but it makes it a particularly awkward read for publishers. Instead, if it's presented simply as a mapping between AS concepts and Atom XML, it's relatively simple to conceptually map in either direction. If the desire is to have normative text in there to try and ensure interop between implementations, we can certainly find another place to add that... a 'conformance' section or something.

Martin Atkins

unread,

Mar 11, 2010, 6:19:23 PM3/11/10

to activity...@googlegroups.com

On 03/11/2010 03:03 PM, Will Norris wrote:
>
> I personally find the imperative / prescriptive style a bit more
> difficult to read. I'm also not sure that writing it in terms of
> processing is the best overall approach. In some sense it may be
> helpful for consumers, but it makes it a particularly awkward read for
> publishers. Instead, if it's presented simply as a mapping between AS
> concepts and Atom XML, it's relatively simple to conceptually map in
> either direction. If the desire is to have normative text in there to
> try and ensure interop between implementations, we can certainly find
> another place to add that... a 'conformance' section or something.
>

My goal was to write it in roughly the expected processing order but to
write it in such a way that it can also be read out of order to discover
how to publish.

Whether I've succeeded in that goal is of course debatable.

Will Norris

unread,

Mar 17, 2010, 12:50:32 PM3/17/10

to activity...@googlegroups.com

It occurred to me yesterday that the schema spec is also written in a more descriptive style, mapping FROM the conceptual TO the Atom representation. It would certainly be good to be consistent between the two, whichever approach is taken.

-will

Chris Messina

unread,

Mar 20, 2010, 4:17:59 PM3/20/10

to activity...@googlegroups.com

Could you guys give us an update as to where you are with this rewrite and let us know what else needs to be done?

I would like to target end of April as the final deadline for 1.0 — so knowing how far along you are with the rewrite, and what help you need, would be very useful.

Chris

--
You received this message because you are subscribed to the Google Groups "Activity Streams" group.
To post to this group, send email to activity...@googlegroups.com.
To unsubscribe from this group, send email to activity-strea...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/activity-streams?hl=en.

--
Chris Messina
Open Web Advocate, Google

Personal: http://factoryjoe.com
Follow me on Buzz: http://buzz.google.com/chrismessina
...or Twitter: http://twitter.com/chrismessina

This email is: [ ] shareable [X] ask first [ ] private

Martin Atkins

unread,

Mar 21, 2010, 11:27:45 PM3/21/10

to activity...@googlegroups.com

Chris Messina wrote:
> Could you guys give us an update as to where you are with this rewrite
> and let us know what else needs to be done?
>

> I would like to target end of April as the final deadline for 1.0 ï¿½ so

> knowing how far along you are with the rewrite, and what help you need,
> would be very useful.
>

Yesterday Monica made me aware of a misunderstanding where I thought you
were going to unveil the existing draft as the first release at SXSW and
so this rewrite was not a pressing concern. However, I now understand
that you're waiting for the rewrite to be completed.

I don't expect the remaining object representation sections to take very
long, since they're basically the same as the "object as atom:entry"
section with a few element names twiddled.

I also need to bring in the requirements around RSS parsing which should
hopefully be easier to do now that there's a cleaner separation between
the abstract data model and the Atom serialization in the prose we
already have. I plan to write this in the new spec as a mapping between
RSS and the abstract data model rather than a mapping between RSS and
Atom as it was before.

(I also need to get RSS support in the Activity Streams Tester app I
wrote, but that's a separate issue entirely.)

Chris Messina

unread,

Mar 22, 2010, 3:03:32 PM3/22/10

to activity...@googlegroups.com

On Sun, Mar 21, 2010 at 8:27 PM, Martin Atkins <ma...@degeneration.co.uk> wrote:

Chris Messina wrote:

Could you guys give us an update as to where you are with this rewrite and let us know what else needs to be done?

I would like to target end of April as the final deadline for 1.0 — so knowing how far along you are with the rewrite, and what help you need, would be very useful.

Yesterday Monica made me aware of a misunderstanding where I thought you were going to unveil the existing draft as the first release at SXSW and so this rewrite was not a pressing concern. However, I now understand that you're waiting for the rewrite to be completed.

Yes, sorry for not making that clear. Since we want the 1.0 to be as good as it can be (within some reasonable time constraints) I'd rather wait for us to get the rewrite done and launch with that, rather than ending up in the OAuth scenario where the 1.0 comes out, no one understands it, and we have to rely on an "editor's cut" to explain to new/uninitiated audiences what is that we're trying to accomplish.

So, I was able to pitch ActivityStreams as a concept at SXSW, and rather than announce 1.0, we get until the end of next month to get'er done.

Please keep us posted on your progress — and let us know what document we should be reviewing and what kind of feedback you and Will need!

Thanks,

Chris

John Panzer

unread,

Mar 22, 2010, 5:29:43 PM3/22/10

to activity...@googlegroups.com

In section 4.1:

"Although there is not a one-to-one mapping between an Atom entry and an activity, consumers SHOULD retain the value of the atom:id element from which each activity was produced such that these activities can be identified should an entry be referred to by another Atom feature which is not aware of the extensions defined in this specification. This id is not unique across all activities and has no semantic singificance for Activity Streams processing."

It's very hard to read this sentence, perhaps breaking it up would help. But I believe it's saying that, as a side effect of the potential one-to-many mapping between an Atom entry and activities, the atom:id is not guaranteed to be unique and should not be relied upon.

This unfortunately breaks Atom semantics, especially when you start to move into the AtomPub case (read/write editing of activities). Of course in that case you almost certainly have an activity-aware client but it makes life more difficult. It also makes it hard to mix activities and non-activities in the same library or feed.

De-duping is also affected. I'm not sure what semantics I could rely on for detecting loops in feed syndication, for example (a case where you may well have mixed activities and non-activities).

If there's a good reason for all this then it'd be okay -- it's not a huge deal -- but it seems like this is not really necessary, unless I'm missing something: If you define (at the abstract level, in section 3, probably as an additional section) that an activity can also be an "aggregate activity" that has its own synthesized id, then you have a 1:1 mapping to entries again. You also define the semantics for aggregate activities to be the same for JSON and Atom (different serializations, of course).

The synthesized id for the aggregate activity is useful for (a) detecting simple loops, deletions, and updates; (b) editing and deleting, probably programmatically as these are machine-generated.

The Salmon spec assumes that the atom:id can be used for all of these purposes, especially if the entry is signed so you are protected from spoofing attacks (as is the case with Salmon).

-John

John Panzer

unread,

Mar 22, 2010, 5:32:19 PM3/22/10

to activity...@googlegroups.com

Looking at the Magic Signatures spec[1], I ended up doing the same thing that Martin did (abstract model, then serialization with serialization-thing -> abstract thing) purely because then the serialization section read naturally, the way you'd write it independently, except that you have a common abstract model to fall back on.

You do have to pick one or the other way to do it though, I agree that the model should be the same across the two documents...

[1] http://salmon-protocol.googlecode.com/svn/trunk/draft-panzer-magicsig-00.html#anchor3

On Wed, Mar 17, 2010 at 9:50 AM, Will Norris <wi...@willnorris.com> wrote:

Martin Atkins

unread,

Mar 22, 2010, 6:15:44 PM3/22/10

to activity...@googlegroups.com

On 03/22/2010 02:29 PM, John Panzer wrote:
> In section 4.1:
>
> "Although there is not a one-to-one mapping between an Atom entry
> and an activity, consumers SHOULD retain the value of the atom:id
> element from which each activity was produced such that these
> activities can be identified should an entry be referred to by
> another Atom feature which is not aware of the extensions defined in
> this specification. This id is not unique across all activities and
> has no semantic singificance for Activity Streams processing."
>
>
> It's very hard to read this sentence, perhaps breaking it up would help.
> But I believe it's saying that, as a side effect of the potential
> one-to-many mapping between an Atom entry and activities, the atom:id is
> not guaranteed to be unique and should not be relied upon.
>
> This unfortunately breaks Atom semantics, especially when you start to
> move into the AtomPub case (read/write editing of activities). Of
> course in that case you almost certainly have an activity-aware client
> but it makes life more difficult. It also makes it hard to mix
> activities and non-activities in the same library or feed.
>

Right. It's trying to say that the atom:id is for the entry itself and
not for the activity. So by publishing multiple activities in one entry
you lose the ability to address them independently when using
technologies like tombstones... tombstones will only be able to "delete"
the set of activities as one atomic unit.

This paragraph means to say that consumers SHOULD keep a record of the
atom:id of the entry that the activity came from, but that this is NOT
the id of the activity itself.

> De-duping is also affected. I'm not sure what semantics I could rely on
> for detecting loops in feed syndication, for example (a case where you
> may well have mixed activities and non-activities).

There's no such thing as a non-activity. All Atom entries represent at
least one activity, though that activity may of course be posting an
object of an undefined type.

In my toy implementation I used a tuple of (verbs, object id, actor id,
target id, time) as the activity's natural "primary key", but of course
this is not resiliant to passing through services which do not preserve
the object ids. I'm not convinced that services which cannot preserve
the object ids would be able to preserve the activity ids either,
though. I think in practice deduping needs to be done with heuristics on
the content of objects in order to get good results.

> If there's a good reason for all this then it'd be okay -- it's not a
> huge deal -- but it seems like this is not really necessary, unless I'm
> missing something: If you define (at the abstract level, in section 3,
> probably as an additional section) that an activity can also be an
> "aggregate activity" that has its own synthesized id, then you have a
> 1:1 mapping to entries again. You also define the semantics for
> aggregate activities to be the same for JSON and Atom (different
> serializations, of course).
>
> The synthesized id for the aggregate activity is useful for (a)
> detecting simple loops, deletions, and updates; (b) editing and
> deleting, probably programmatically as these are machine-generated.
>
> The Salmon spec assumes that the atom:id can be used for all of these
> purposes, especially if the entry is signed so you are protected from
> spoofing attacks (as is the case with Salmon).
>

The ability to represent multiple activities as a single entry is
intended for use in feeds that are primarily aimed at non-activity-aware
consumption but wish to add some annotations to get nice results in an
activity-aware reader. In practice very few systems have published feeds
representing several activities, so we could decide to remove this
ability at the expense of being able to activity-retrofit feeds that do.

Of course, we'd still need to define some different behavior for the
implied activity case, since right now the atom:id in there is defined
to be the id of the object itself and not the id of the activity.

Reply all

Reply to author

Forward