PROPOSAL: Associated or Additional Content

Yehuda Katz

unread,

Apr 30, 2012, 1:48:23 PM4/30/12

to Collection+JSON

I've been reviewing Collection+JSON as a possible transport format for
ember-data. I ran into a few issues that I would like to discuss.

The first one is the question of associated content, that is,
additional resources that you would like to include along with the
included resource to avoid making multiple requests. You would want to
treat them as full-on resources and not simple embedded content, so
that they could be updated separately.

I've been thinking about something like this:

GET /posts

{ "collection" :
{
"version" : "1.0",
"href" : "http://example.org/posts/",

"items" : [
{
"href" : "http://example.org/posts/1",
"data" : [
{"name" : "full-name", "value" : "J. Doe", "prompt" : "Full
Name"},
{"name" : "email", "value" : "jd...@example.org", "prompt" :
"Email"}
],
"many": [
{ "name": "comments", "href": "http://example.org/comments/
1,2,3,4", "length": 4 }
]
}
],
"additional": {
"http://example.org/comments/1,2,3,4": { /* embedded collection
+json */ }
}
}
}

The basic idea is that there is additional content that should be
treated as regular resources, but which the server knows the client
will want and therefore wants to preload.

Some other scenarios:

* An article has an author. The client GETs /articles/?page=1, which
returns 10 posts. All of those articles contain the same author.
Embedding the author in the article means that all 10 posts will
contain the same author. By including the author as additional content
(via a "one" key), we can avoid duplication and make it easy for
clients to deserialize the content into a local store.
* A server exposes a list of all of the events that occur on all
clients. There are hundreds of millions of events, so the GET API will
be paginated and the clients will POST new events as they occur.
Because there are so many events, and the client doesn't have access
to all of them, event statistics cannot be performed on the client.
The server would like to provide a statistics resource alongside the
response to any new event POST. In this case, the statistics are not
part of an association, but they are still additional content.

Again, my overriding concern is to treat the additional content simply
as preloaded resources, and not embedded content whose remote resource
needs to be semantically extracted.

mca

unread,

Apr 30, 2012, 1:59:32 PM4/30/12

to collect...@googlegroups.com

Yehuda:

Good to see your post here.

If I understand you correctly, you are proposing something that looks
like this (high-level view):

// sample collection object
{
"collection" :
{
"version" : "1.0",
"href" : URI,
"links" : [ARRAY],
"items" : [ARRAY],

"additional" :{OBJECT}, // <-- a single "inline" collection

"queries" : [ARRAY],
"template" : {OBJECT},
"error" : {OBJECT}
}
}

Is that right? Any reason we should consider this as an array of one
or more inline collections?
"additional" : [ARRAY], // <-- an array of "inline" collection

Also, how will the client know that the inline collection is "linked"
to the items array? Is that what the "many" array is for? why is this
an array, btw? would there be a "many" for each item in a collection?

maybe there should/could be a more explicit connection between this
"many" array and the inline collection (via name or rel sharing?)

mca
http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

Yehuda Katz

unread,

Apr 30, 2012, 2:06:24 PM4/30/12

to collect...@googlegroups.com

Yehuda Katz
(ph) 718.877.1325

On Mon, Apr 30, 2012 at 10:59 AM, mca <m...@amundsen.com> wrote:

Yehuda:

Good to see your post here.

If I understand you correctly, you are proposing something that looks
like this (high-level view):

// sample collection object
{
"collection" :
{
"version" : "1.0",
"href" : URI,
"links" : [ARRAY],
"items" : [ARRAY],

"additional" :{OBJECT}, // <-- a single "inline" collection

"queries" : [ARRAY],
"template" : {OBJECT},
"error" : {OBJECT}
}
}

Is that right? Any reason we should consider this as an array of one
or more inline collections?
"additional" : [ARRAY], // <-- an array of "inline" collection

Not quite. The additional key is a hash of additional content whose keys are URIs. This means that the link between the content is the same as a regular HTTP link.

Mark Burns

unread,

Apr 30, 2012, 2:51:44 PM4/30/12

to collect...@googlegroups.com

Awesome to hear you guys weighing in on this.

Really good to hear about ember as well. We've just chosen ember for an ember/rails project and I've been following the Collection+JSON/hypermedia stuff closely and was contemplating how much work might be involved in tying the two together.
So good to hear you may be putting some weight behind the efforts.

Something I'd brought up on this list in the past was validation messages per field in the error response, but I've not had chance to look at that since chatting to Mike when he was over in London. I've also mentioned the concept of nested resources.

Also Nick Sutterer mentioned providing support for Collection+JSON in his representer/roar gem
https://github.com/apotonick/roar so it might be worth keeping track of that project to see if he does so.

And I don't feel I can necessarily articulate this very clearly but I somehow feel that the concepts of

validation messages at this level,
Wycats's https://gist.github.com/1974187 (proposal for mass-assignment),
the concepts of having Forms as first-class object-y citizens in Rails in a Django way, as mentioned by Norbert Wójtowicz ‏ (@pithyless) at wroc_love.rb,

are all inter-related in some way. They all feel to me that they point at an underlying missing pattern or present anti-pattern or something.

Anyway, I'm just throwing these thoughts out there really on the off-chance that they are useful ideas to have/spark some debate.

Also apologies to all the non-rubyists on this list for the heavy ruby-bias around the things I've mentioned.

mca

unread,

Apr 30, 2012, 2:52:43 PM4/30/12

to collect...@googlegroups.com

> Not quite. The additional key is a hash of additional content whose keys are
> URIs. This means that the link between the content is the same as a regular
> HTTP link.

so, using your first example: additional[many[0].href] returns the
"inline collection", right?
1) are you sure you want to use a URL for this and not an internal id or rel?
2) i see the "name", "href", and "length" for the "many" object. are
there other possible values that would appear here?
3) you have "many" as an array - does that mean you see possible
multiple "inline collections" in a single response representation?
4) it seems like the only difference between the links:[] array for an
item and the many:[] array for an item is that one is inline, the
other is not. maybe we just need a decoration on the links:[] items to
indicate that the link can be used to find the inline collection.

thoughts?

mca
http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

Yehuda Katz

unread,

Apr 30, 2012, 3:08:37 PM4/30/12

to collect...@googlegroups.com

Yehuda Katz
(ph) 718.877.1325

On Mon, Apr 30, 2012 at 11:52 AM, mca <m...@amundsen.com> wrote:

> Not quite. The additional key is a hash of additional content whose keys are
> URIs. This means that the link between the content is the same as a regular
> HTTP link.

so, using your first example: additional[many[0].href] returns the
"inline collection", right?
1) are you sure you want to use a URL for this and not an internal id or rel?

I think so? It makes it easy to include these associations optionally, and not need a lot of different code depending on whether it was included or not.

2) i see the "name", "href", and "length" for the "many" object. are
there other possible values that would appear here?

Not that I can think of. Can you?

3) you have "many" as an array - does that mean you see possible
multiple "inline collections" in a single response representation?

Yes. You could imagine an article has many comments and tags, for instance.

4) it seems like the only difference between the links:[] array for an
item and the many:[] array for an item is that one is inline, the
other is not. maybe we just need a decoration on the links:[] items to
indicate that the link can be used to find the inline collection.

That sounds great.

mca

unread,

Apr 30, 2012, 3:19:51 PM4/30/12

to collect...@googlegroups.com

ok, so how about this:
https://gist.github.com/2561720

does this provide the support you are looking for?

feel free to edit as needed.

mca
http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

mca

unread,

Apr 30, 2012, 3:48:27 PM4/30/12

to collect...@googlegroups.com

quick follow up.

is the length property needed on the link(many) item?
collection.item.length returns this, right (or
inline[links[x].href].items.length)?

mca
http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

Yehuda Katz

unread,

May 1, 2012, 12:52:41 AM5/1/12

to collect...@googlegroups.com

On Apr 30, 2012 12:48 PM, "mca" <m...@amundsen.com> wrote:
>
> quick follow up.
>
> is the length property needed on the link(many) item?
> collection.item.length returns this, right (or
> inline[links[x].href].items.length)?
>

Yes, but again, the goal is to make the inline embedding optional. It's essentially just a preloaded resource, so the main resource should work agnostically of any prefetched embedded content.

mca

unread,

May 1, 2012, 1:13:52 AM5/1/12

to collect...@googlegroups.com

> Yes, but again, the goal is to make the inline embedding optional. It's
> essentially just a preloaded resource, so the main resource should work
> agnostically of any prefetched embedded content.

Yep, I agree: inline is completely optional.

So, in summary, are you satisfied w/ an extension such that:
collection.items[x].links[y].inline=true // new optional property
and
collection.inline[collection.items[x].links[y].href] // returns the
inline collection

Does this give you the functionality you are looking for?

mca
http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

Yehuda Katz

unread,

May 1, 2012, 1:45:57 AM5/1/12

to collect...@googlegroups.com

Is this with or without the additional length metadata?

Yehuda Katz
(ph) 718.877.1325

mca

unread,

May 1, 2012, 1:56:11 AM5/1/12

to collect...@googlegroups.com

If you need the length property, I have no problem w/ you adding it to
the link element and I suggest using a name that links the two
properties (inlineLength, inlineCount, or something else that works
for you).

mca
http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

Yehuda Katz

unread,

May 1, 2012, 1:59:48 AM5/1/12

to collect...@googlegroups.com

I don't see how clients in general could avoid needing the length property. Assume that inline is an optimization and imagine the API without any inline content.

Related note: what are the rules for extensions to the spec?

Yehuda Katz
(ph) 718.877.1325

mca

unread,

May 1, 2012, 2:07:47 AM5/1/12

to collect...@googlegroups.com

> I don't see how clients in general could avoid needing the length property.
> Assume that inline is an optimization and imagine the API without any inline
> content.

:) imagine a representation where the length value does not match the
number of items in the inline collection.

> Related note: what are the rules for extensions to the spec?

There are no rules at this point. since the media type does not use
any type of namespaces, the extensions are really just a "negotiated"
mod; what we are doing now<g>.

proly the smart strategy is to write up a short doc similar to the
current specs and post it either at the spec site
(http://amundsen.com/media-types/collection/) or in the github repo
(https://github.com/mamund/collection-json).

this is new for me and i'm open to suggestions and willing to do any
heavy-lifting that will make this work for you.

mca
http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

Dennis Roof

unread,

Jul 10, 2012, 11:01:07 AM7/10/12

to collect...@googlegroups.com

The inline extension looks like a smart optimization. But I don't really understand the length property. Since the embedded collection is an object, can't I simply check how many items the collection contains? Or does it have some other purpose?

Also, is the href "http://example.org/comments/1,2,3,4" supposed to be a valid URL? As in, if I'd execute that URL, should I expect the same collection as the one that was preloaded?

Yehuda Katz

unread,

Jul 10, 2012, 11:10:51 AM7/10/12

to collect...@googlegroups.com

Yehuda Katz
(ph) 718.877.1325

On Tue, Jul 10, 2012 at 8:01 AM, Dennis Roof <dennis....@gmail.com> wrote:

The inline extension looks like a smart optimization. But I don't really understand the length property. Since the embedded collection is an object, can't I simply check how many items the collection contains? Or does it have some other purpose?

The embedded information is optional, and you'd want the parent resource to look the same regardless of whether the embedded information was provided.

Also, is the href "http://example.org/comments/1,2,3,4" supposed to be a valid URL? As in, if I'd execute that URL, should I expect the same collection as the one that was preloaded?

Exactly.

Dennis Roof

unread,

Jul 11, 2012, 9:40:20 AM7/11/12

to collect...@googlegroups.com

I want to like this idea, but I wonder what impact the inline collections would have on the server-side cacheability of the API calls. I was thinking of caching entire collection outputs based on a unique URL + selection of HTTP header fields. Inline collections create another variable inside collections which could make the cache expire faster. That is, if an item in an inline collection gets updated in which case the link won't change, but the inline collection would.

The same goes for links such as "http://example.org/comments/1,2,3,4" rather than four separate links, which are less unique thus better for caching purposes. It would mean the difference between making five API calls for retrieving the article with four comments rather than two, but all five API calls would be cached. In case of HTTP caching like Varnish, those calls would not even reach the web-server. What would be more important? Minimizing the number of API calls or minimizing the number of request processes for the web-server, application and database?

mca

unread,

Jul 11, 2012, 9:48:24 AM7/11/12

to collect...@googlegroups.com

Dennis:

Extensive use of inlining can create large, complex-to-cache responses.
Making messages very tiny will can create way too chatty interactions, resulting in poor performance.
There is a "happy medium" you, as the dev/arch, have to meet for your API.

Keep in mind you are designing representations, not echoing data rows and tables. Your API is the opportunity to "shape" the client-server interaction to optimize messages for the HTTP protocol (a lossy, large-grained message model).

I, myself, do not use inlining. I *do* however very often use compositing (i.e. returning heterogeneous collection such as users and companies in the same response).

mca

+1.859.757.1449

http://amundsen.com/blog/
http://twitter.com@mamund
http://mamund.com/foaf.rdf#me

Greg Knapp

unread,

Nov 6, 2013, 7:00:50 AM11/6/13

to collect...@googlegroups.com

Could you provide an example of the heterogeneous collections you use?

I understand from a conceptual level what you mean but I'd be interested in seeing the semantics.

Thanks

mca

unread,

Nov 6, 2013, 9:49:27 AM11/6/13

to collect...@googlegroups.com

here's an example i whipped up: https://gist.github.com/mamund/7337187

note it holds multiple "kinds" of items: user, account, activity.

there is quite a bit more that can be done in this area, too.

let me know if you want to discuss further.

mca

+1.859.757.1449
skype: mca.amundsen
http://amundsen.com/blog/
http://twitter.com/mamund
https://github.com/mamund
http://www.linkedin.com/in/mamund

--
You received this message because you are subscribed to the Google Groups "Collection+JSON" group.
To unsubscribe from this group and stop receiving emails from it, send an email to collectionjso...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Larry Marburger

unread,

Nov 6, 2013, 10:48:36 AM11/6/13

to collect...@googlegroups.com

Here's another example that we recently implemented: https://gist.github.com/lmarburger/7169444

--
Larry Marburger
Homeopathic Code Remedyologist

Dennis Roof

unread,

Dec 16, 2013, 5:27:11 AM12/16/13

to collect...@googlegroups.com

Can I assume both the heterogeneous collection and inline collection are used for API optimization?

In both cases, they reduce the number of API calls required to get related items and they equally create complex to cache outputs.

Maybe a difference would be that inline collections introduce a hard hierarchy among items where heterogeneous collections could group items together that are never retrieved separately or used in another context. For example, I could put an article with comments in a heterogeneous collection since the comments always belong to a specific article. But I would put the tags of an article in links, as tags are also used for other resources and it would be an advantage to have separate caches for them.

Op woensdag 6 november 2013 15:49:27 UTC+1 schreef Mike Amundsen:

mca

unread,

Dec 16, 2013, 8:22:06 AM12/16/13

to collect...@googlegroups.com

I think it's important not to fall into the trap of thinking about "objects" or "strong types" here.

Cj is designed to offer up collections of _items_. not collections of "users", "products", "customers", etc.

if a representation of items is not "all the same kind" that's no big issue from the Cj POV.

mixing "kinds" in a representation is done primarily to improve UX/DX, not to expose any internal object hierarchy or relationships.

does this help?

mca

+1.859.757.1449
skype:mca.amundsen
http://amundsen.com/blog/
http://twitter.com/mamund
https://github.com/mamund

http://linkedin.com/in/mamund

Dennis Roof

unread,

Dec 16, 2013, 8:47:16 AM12/16/13

to collect...@googlegroups.com

Thank you for the response. Yes, I think it does. Mixing items in CJ for convenience, not relationships. I'll keep that in mind.

Op maandag 16 december 2013 14:22:06 UTC+1 schreef Mike Amundsen:

Dennis Roof

unread,

Dec 18, 2013, 4:52:18 AM12/18/13

to collect...@googlegroups.com

When dealing with a homogeneous collection, I could define API URLs like the following:

Get all articles:

api.domain.com/articles

Get a specific article with id 100:

api.domain.com/articles/100

If I create a heterogeneous collection, does it still make sense to retrieve a specific item REST-style? Or should the queries deal with all item filtering in a collection?

Op maandag 16 december 2013 14:22:06 UTC+1 schreef Mike Amundsen:

I think it's important not to fall into the trap of thinking about "objects" or "strong types" here.

mca

unread,

Dec 18, 2013, 9:06:31 AM12/18/13

to collect...@googlegroups.com

you can use whatever you wish.

i've even returned collections that have items on different servers.

URLs are not important here.

mca

+1.859.757.1449
skype:mca.amundsen
http://amundsen.com/blog/
http://twitter.com/mamund
https://github.com/mamund
http://linkedin.com/in/mamund

Reply all

Reply to author

Forward