So, what is M2M (machine to machine) communication anyway?

Jørn Wildt

unread,

Feb 9, 2013, 4:14:30 AM2/9/13

to api-...@googlegroups.com

The resent discussions about a new media type for APIs has some underlying assumption about M2M communications. But do we really agree on what M2M communication is anyway? Let me try to illustrate with some different examples:

* Team A is implementing a background service that reads stuff from one API, transforms it in some way, and writes it back to another API - a "typical" systems integration service that keeps two legacy systems synchronized.

* Team B is implementing a classic mobile application that interacts with a public API. Lets say something like implementing a Twitter client for mobile phones. Team B hard codes the UI at compile time with the knowledge of what the API can do for them. They build a UI with X specific input fields and transfer these inputs to the API when the user click "Submit".

* Team C is also implementing a mobile application for a public API, just like team B. But this time the API exposes hyper media controls that define inputs and buttons and so on. They take that information and renders a dynamic UI that changes at runtime based on what information the server returns.

* Team D is also implementing a mobile application. But they decide to simply embed the HTML version of the service exposed by the API (so they simply ignore that an API exists).

Now, all of these four implementations has M2M communication - but in very different ways. Now, before we examine the different approaches I would like to define a few terms:

- End user documentation: labels and help texts intended for the end user.

- Technical documentation: human readable text intended for the developers that write clients for the API.

- Transport level documentation: information about URLs, concrete URL templates, encodings, formats, HTTP methods and other stuff that a client can read at runtime an base its requests on.

- Inline documentation: any kind of documentation which is part of a response (and I consider an embedded link to documentation as "inline")

- Out of bounds documentation: any kind of documentation which is not part of a response

So lets look at the different implementations:

* Team A's implementation is a 100% autonomous "robot" without any human interaction. It has absolutely no need for any kind of inline end user documentation or technical documentation. That would just be a waste of bandwidth. It is also very tightly coupled to the API.

* Team B' implementation requires human interaction. But since it has a hard coded UI it won't need any kind of inline end user documentation or technical documentation. It is also very tightly coupled to the API. The upside is that they can design the UI to fit exactly their needs - but it comes at the cost of higher coupling.

* Team C's implementation requires human interaction. But since their implementation rely on a hyper media based API it can benefit from inline end user documentation embedded in the response. This implementation is more loosely coupled to the API and allows the server to change the layout of the pages. But if the client is hard coded with the expectation of a certain flow (set of pages to go through) then there is still some coupling here. This implementation leaves the UI style to be defined by the client (colors and fonts and so on can freely be selected by the client)

* Team D's implementation is obviously the most flexible and loosely coupled implementation. The server is free to change anything it wants in the problem domain as well as the UI layout - as long as it sticks to the technical HTML docs. The upside is maximal loose coupling - the downside is that the client has absolutely no control of styling and branding.

* All implementations can benefit from inline transport documentation. That means the server is free to change URL structure and so on.

* None of the implementations need inline technical documentation. This is always something the developers go through while creating the client.

So, the different implementations have different characteristics, going from autonomous/tightly coupled/low bandwidth to human driven/loosely coupled/high bandwidth. You choice of media type will be affected by these characteristics.

What kind of M2M project are you involved in?

/Jørn

sune jakobsson

unread,

Feb 9, 2013, 11:43:22 AM2/9/13

to api-...@googlegroups.com

But your use cases vary here from "copying data to providing UI",
problem is that M2M can be aggregated at "any" level, it all depends
on the audience or their goals, or the "package" they are paying for.

Sune

> --
> You received this message because you are subscribed to the Google Groups
> "API Craft" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to api-craft+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/api-craft?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Jørn Wildt

unread,

Feb 9, 2013, 2:32:46 PM2/9/13

to api-...@googlegroups.com

Sune, Exactly :-) That's my whole point - when people say "we need a new media type for M2M communication" they forget that M2M communication can be a lot of different things.

We should instead speak about a media type for "systems integration" or "mobile apps with predefined UI" - or what ever other scenario it is.

/Jørn

Mike Schinkel

unread,

Feb 10, 2013, 3:39:59 AM2/10/13

to api-...@googlegroups.com

Hi Jørn,

Since you created a new thread I'm replying separately.

On Feb 9, 2013, at 4:14 AM, Jørn Wildt <j...@fjeldgruppen.dk> wrote:

The resent discussions about a new media type for APIs has some underlying assumption about M2M communications. But do we really agree on what M2M communication is anyway?

Is it important to define a new term (M2M) for discussion? Doesn't API (in the context of this group) or Web API in general mean the same thing as you are trying to mean with M2M? I think adding more terms if they don't add important insight only makes thing more confusing.

Let me try to illustrate with some different examples:

* Team A is implementing a background service that reads stuff from one API, transforms it in some way, and writes it back to another API - a "typical" systems integration service that keeps two legacy systems synchronized.

* Team B is implementing a classic mobile application that interacts with a public API. Lets say something like implementing a Twitter client for mobile phones. Team B hard codes the UI at compile time with the knowledge of what the API can do for them. They build a UI with X specific input fields and transfer these inputs to the API when the user click "Submit".

* Team C is also implementing a mobile application for a public API, just like team B. But this time the API exposes hyper media controls that define inputs and buttons and so on. They take that information and renders a dynamic UI that changes at runtime based on what information the server returns.

* Team D is also implementing a mobile application. But they decide to simply embed the HTML version of the service exposed by the API (so they simply ignore that an API exists).

...

So, the different implementations have different characteristics, going from autonomous/tightly coupled/low bandwidth to human driven/loosely coupled/high bandwidth. You choice of media type will be affected by these characteristics.

What kind of M2M project are you involved in?

For what I believe you meant as the purpose of your examples, I think they are contrived and I think ultimately they all have the same characteristics if viewed in a different context:

- Team A is simply reading from API #1 and writing to API #2. It's not a special case.

- Team B simply implements read and write of an API which for argument sake we'll say is API #1.

- Team C is doing the same thing as Team B only they are doing it with API #2, for example.

- Team D is using only the services that could be a subset of services provided by either API #1 or API #2.

The only real difference is that the implementor of API #1 published docs for URL construction and API #2 published a hypermedia API. But there's no reason other then their choices that they did so; API #1 could have been hypermedia and API #2 could have been URL construction.

The entire point of the proposal in the other thread was that the differences in the four teams and the two APIs is arbitrary and we'd be better off without arbitrary differences.

How do we get rid of arbitrary differences? We create a set of standards so all teams and all API developers have guidance for doing it the same way, and we do our best to cultivate open-source implementations of these standards so that the easiest thing for everyone to do is to follow the standards rather than attempt to roll their own.

-Mike

P.S. Yes there are differences between what you can do with low bandwidth/high latency and small memory/CPU devices vs. high bandwidth/low latency and large memory/CPU devices but I believe those differences can be addressed with one initiative rather than assume they need to be segmented. Ultimately it's about one machine talking to another and those limitations are just constraints to enable optimizations for.

Jørn Wildt

unread,

Feb 10, 2013, 3:06:13 PM2/10/13

to api-...@googlegroups.com

Seems like its a bit difficult for me to convey my thoughts ... and maybe its just me thinking too loud in the public ...

Let me start we your last sentence: "Ultimately it's about one machine talking to another" ... well, no, there are different approaches to how they talk together - some more loosely coupled than others and some requiring "in line" end user documentation whereas others do not.

Lets take HAL as an example. It is perfectly suited for autonomous background integrations like Team A is doing. It is also a perfect match for Team B. But Team C will be left without any kind of support for in line end user documentation, input definitions, labels, buttons and so on. So, its M2M communication, yes, but HAL (or similar) lacks something that Team C needs. You can of course layer your own interpretation of end user docs properties on top of HAL - but then you go beyond the media type specs.

So, where I am going with all this? Maybe I am trying to say that using a "low fidelity" media type like HAL (that only concerns itself with links and embedded resources) will force clients to be more tightly coupled to the server implementation than clients that use a "high fidelity" media type which extends itself to include end user documentation, UI generation and various kinds of forms definitions that allows the client to build dynamic user interfaces.

I am also trying to say that it is a trade-off. If you want to mash things together and build your own interface then you must get "closer to the metal" to get the raw data (compared to simply using the HTML UI).

Further more I am trying to say that, well, perhaps this is simply the constraints that we have to live with - IF we want to create clients with hard coded UIs, because we want to go beyond what is some existing HTML UI, then our implementations will be tighter coupled to the service compared to a dynamic UI generated, on the fly, by the server.

And - in the opposite corner - if we really want to make truly loosely coupled clients then we need a "high fidelity" media type ... and we will end up re-inventing HTML.

My final point is that, due to this "spectrum" of flexibility and loose coupling we can gain by including or excluding hyper media elements, input forms and so on, there will never be such a thing as a media type that fits all M2M needs - unless we can parameterize it with the level of inline docs and so on the client requires ... and maybe that's an idea worth exploring? Or, perhaps, it is just two media types where one is a subset of the other?

/Jørn

--

Mike Schinkel

unread,

Feb 10, 2013, 5:06:54 PM2/10/13

to api-...@googlegroups.com

On Feb 10, 2013, at 3:06 PM, Jørn Wildt <j...@fjeldgruppen.dk> wrote:

Seems like its a bit difficult for me to convey my thoughts ... and maybe its just me thinking too loud in the public ...

I think we might both be struggling with a bit of that.

Let me start we your last sentence: "Ultimately it's about one machine talking to another" ... well, no, there are different approaches to how they talk together - some more loosely coupled than others and some requiring "in line" end user documentation whereas others do not.

I understand you see them as different however I see them as potentially just aspects of a unified approach. For example, why can't we have an API that uses a HAL-like format but that can also serve HTML forms when needed for human interaction?

Lets take HAL as an example. It is perfectly suited for autonomous background integrations like Team A is doing. It is also a perfect match for Team B. But Team C will be left without any kind of support for in line end user documentation, input definitions, labels, buttons and so on. So, its M2M communication, yes, but HAL (or similar) lacks something that Team C needs. You can of course layer your own interpretation of end user docs properties on top of HAL - but then you go beyond the media type specs.

So, where I am going with all this? Maybe I am trying to say that using a "low fidelity" media type like HAL (that only concerns itself with links and embedded resources) will force clients to be more tightly coupled to the server implementation than clients that use a "high fidelity" media type which extends itself to include end user documentation, UI generation and various kinds of forms definitions that allows the client to build dynamic user interfaces.

I assert you are describing what *is* today, but I assert that what is does not invalidate what *could be* tomorrow. For example, we can choose to accept that HAL can only be used for low fidelity or we can demand greater i.e. that HAL evolve or we give up on HAL and use something else. (I just inadvertently paraphrased the SyFy channel's motto. How apropos. :)

I am also trying to say that it is a trade-off. If you want to mash things together and build your own interface then you must get "closer to the metal" to get the raw data (compared to simply using the HTML UI).

I believe the same arguments were made to explain why TBL's vision of the web was not attainable. And like TBL I assert that it is not a given.

Further more I am trying to say that, well, perhaps this is simply the constraints that we have to live with - IF we want to create clients with hard coded UIs, because we want to go beyond what is some existing HTML UI, then our implementations will be tighter coupled to the service compared to a dynamic UI generated, on the fly, by the server.

A lot of clients today have hard-coded UIs and I would assert that many of them have hard-coded UIs because that was the path of least resistance. I'm proposing we focus on giving them an easier path to dynamic solutions.

There were many people who said forms had to be hardcoded in the late 80s and early 90s. But yet here we are; HTML5 provides rich functionality for declarative forms.

Nothing I'm proposing keeps anyone from continuing to hard-code UIs. If what I'm proposing comes to pass then there will still be people that still choose to hard-code. But there will also be many in this future who do not hardcode because it will be easier to use the dynamic, uncoupled, hypermedia approach.

I think it's good for me to restate one aspect of the proposal: this is for the 80th percentile[1] use-case. The other 20 percentile will continue with business as usual; no capabilities will be taken away from them. I sense that most of the push back is because those pushing back cannot envision how we could do this for 100% of use-cases, and we probably cannot. But that doesn't mean addressing 80% is not doable nor a good idea.

So we would be building a highway but that wouldn't mean the side roads would be closed. (And my analogy fails in a happy way in that we wouldn't have to take anyone's home to build our highway.)

And - in the opposite corner - if we really want to make truly loosely coupled clients then we need a "high fidelity" media type ... and we will end up re-inventing HTML.

No, for web APIs we don't need the vast majority of things that HTML provides for the benefit of human visitors who are visual, auditory or other sensing. We can continue to leverage HTML for those things that require a human UI.

What we instead need are machine affordances that allow for hands-off workflow orchestration. We need standard ways to discover authentications methods for APIs, standard ways to recognize the services offered by APIs, and standard ways to transition from one API to another. Each of these "standards" would and should each be small. An example might be a way to define a fragment of JSON that can represent the list of entities exposed by an API. HAL might store that fragment in one location and Siren in another.

My final point is that, due to this "spectrum" of flexibility and loose coupling we can gain by including or excluding hyper media elements, input forms and so on, there will never be such a thing as a media type that fits all M2M needs - unless we can parameterize it with the level of inline docs and so on the client requires ... and maybe that's an idea worth exploring? Or, perhaps, it is just two media types where one is a subset of the other?

My dad taught me never to say "never" (except in this one context. :)

Again, let's focus on the goal of 80 percentile, not 100%. The former is doable, the latter is (realistically speaking) not.

-Mike

[1] Of course after we resolve the 80th percentile some time will pass and then I expect others will look at the remaining 20%, divide it up again and tackle that remaining 80th percentile. And so on.

Reply all

Reply to author

Forward