versioning as an anti-pattern

6,940 views
Skip to first unread message

Andrei Neculau

unread,
Sep 28, 2012, 3:38:10 AM9/28/12
to api-...@googlegroups.com
Since reading Steve Klabnik's https://secure.designinghypermediaapis.com/nodes/fdivisitjqwp
I have shared it almost every second discussion on APIs and versioning.

In order to clarify some things, I still went with a versioning pattern (on a media-type, not API level) to mark behavior change.
I've always thought that adding new properties to a representation is not a behavior change, until yesterday, so I'm picking your brain on this.

A clear example - an address is represented today as name, street, city, country
tomorrow, you will need to handle care_of or state or some other "trivial" attribute.

When you introduce care_of, the address will be marked by a behavior change.
Until tomorrow, users will have no care_of field in the form, and maybe they will overload the street (address line) to include there a "c/o X"
The API user is happy, because he doesn't really care about the address structure as long as he can do address = name + street + city + country, print it, and the recipient receives the letter.
Tomorrow though, the server will be able to send name, care_of, street, city, country, the API user will still glue address without the care_of, and the recipient will fail to receive the letter because of missing address information.

Looking forward to hearing your take on this.

David Eriksson

unread,
Sep 28, 2012, 8:37:46 AM9/28/12
to api-...@googlegroups.com
Hej!

I'm not one the scholars of the field, but I have been thinking about the same thing.

In your example an important new (but optional) field is added to a resource that will be returned to the client from the server.

If the media type is still the same, all existing clients continue to work, but will most likely ignore the new field. They might detect an unknown field and log it for manual inspection, but would they stop execution? I doubt it. Yes, it will cause the clients to misbehave when the new important behavior-changing field is present.

If the media type is new, but the client sends Accept: */* and ignore the Content-Type in the response, it will continue work for most cases just like above and misbehave when the new field is present.

If the media type is new, and the client is strict about it, the client will ask the server for the old media type (through the Accept header) and get the old one back. If you don't support the old media type anymore, the old clients will not work at all. If you support the old media type, you might have to support it for eternity unless you can know for sure that there aren't any old clients still out there. You also have to figure out how to represent the new important field in the old media type. Maybe append it to the name or street fields...? Now go a couple of version changes like this and you will have a lot of server code to support old media types, but very little incentive for client developers to upgrade to the latest and greatest.

Of course if both server and client are developed by the same project or company, old media types may be deprecated quite soon after new ones have been released, but in such a situation it is also very easy to add another field to the existing media type and then make the client applications start using it!

Personally I rather have my clients add support for the new behavior than maintaining versioning on the server side, but I have also learned that I don't always get what I want...


Cheers,

David Eriksson

Jon Moore

unread,
Sep 28, 2012, 9:41:17 AM9/28/12
to api-...@googlegroups.com, api-...@googlegroups.com
When we talk about extensible media types, its important for the media type to define the semantics of elements a client doesn't understand. Broadly, if you are adding something new, its semantics can be either "must-ignore" or "must-understand".

If you make it "must-understand", meaning that ignoring the element will make you process it incorrectly, you will need to detect client capabilities somehow before including these elements. Having different media types and doing content negotiation with Accept is one way. The HTTP spec in RFC2616 has several behaviors that will not work for 1.0 clients or servers, and the 1.1 version of the protocol requires detecting client or server version before using those features.

On the other hand, if you can describe your feature in a "must-ignore" way, then these new elements are safe to send to the old clients, who will happily ignore them. New clients will take advantage of them. HTTP extension headers are common examples of this. This may require thinking harder about how to encode this feature, but it ultimately simplifies server implementation (just include the latest stuff you support). 

Jon
........
Jon Moore

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group, send email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft?hl=en.
 
 

Darrel Miller

unread,
Sep 29, 2012, 9:50:36 PM9/29/12
to api-...@googlegroups.com
Andrei,

The answer that I have found to avoiding problems such as the one you
describe is to be absolutely clear on intent. Once you understand why
you are including certain pieces of information, it makes it much
easier to change without impacting your clients.
Let me try and explain what I mean. You state:

"address is represented today as name, street, city, country"

This is standard practice, but why is that. Why are we delivering to
the client this information as distinct fields? What will the user do
with that information? Is the client going to display this address to
a user in an UI? Are they going to print a mailing address label?
Are they going to use some of that information as demographic
information?

You suggest the API user may glue the fields together and make mailing
label out of it. Is this something that we want our users doing with
the address? If so then why not give it to them formatted? Do they
know how to format it properly? Do they know that German street
addresses are rendered very differently than Canadian ones. Do they
realize British addresses will often include two towns, the actual
town and a nearby major town? Why are we breaking this down just so
the client has to put it back together? Are we just offloading this
work to the client because we don't want to think to hard, under the
guise of "flexibility"?

On the other hand maybe they do want demographic information? Maybe
they want postal code, maybe they want country, but care_of is not
going to affect those demographics. Maybe there is other demographic
information that we could include in the representation that is far
more important that the information that can be gleaned from the
address.

When we build a representation with information that has a specific
intent it becomes much easier to change that information in a way that
makes sense to the client. Adding care_of information into a textual,
completely formatted address means the client will naturally include
that information when it displays the address or prints a mailing
label. If address is included for demographic information, you know
the client doesn't care about care_of.

Great API's are designed with intent. They are not just a dump of
internal data. Once you understand the intent, you can design the API
to give the client just what they want even as change occurs, because
you know how that data is being used.

Personally, I don't believe in this attitude of "we'll expose our data
and see what people do with it". It's a fast food API, it's quick to
get going, you don't have to think hard, but it's not a long term
healthy choice. It will bring versioning issues, and it will bring
performance issues.

I do believe that client developers will find new uses for our APIs
that we had never considered, and it is critical that we have an
responsive process in place to be able to enhance our API to provide
new capabilities. But I believe those new features must be added in
reaction to an expressed demand and clear intent. HTTP based APIs
have a very interesting quality that it can be very cheap to add new
resources to satisfy new needs. These new resources can address new
completely new requirements, or they can provide enhanced versions of
existing resources. I'm not so convinced that we need to be so quick
at removing old resources that are non-optimal.

Anyway, just my 2c on the subject.

Darrel

Steve Klabnik

unread,
Sep 29, 2012, 10:47:21 PM9/29/12
to api-...@googlegroups.com
Great post, Darrel.

> Great API's are designed with intent. They are not just a dump of
> internal data.

This is a great quote, but it also has a very important word:
"internal." If you expose your internals to others, and you couple
them to your internals, of course you're going to have breakage when
you change your internals. This is almost universally considered bad
software design.

Glenn Block

unread,
Sep 30, 2012, 11:25:23 AM9/30/12
to api-...@googlegroups.com
Well said Darrel!

This illustrates really why exposing a set of CRUD resources that mirror a db is a bad idea. Data is the result of some sort of intent, but the intent itself is not there. Resources can be used to break out of that mold and to allow the client and server to express intention. 

Jørn Wildt

unread,
Sep 30, 2012, 3:33:48 PM9/30/12
to api-...@googlegroups.com
> Great API's are designed with intent.  They are not just a dump of
internal data.

Interesting statement! I like it :-) But you have to start somewhere, right? I am working on one of those "dump the data for others to use" APIs - in parallel with a few guys who are designing an API with a specific intent (a custom mobile/iPad client).

What I see from this is that the mobile API is so narrow that nobody else will be able to use it since it is driven by very specific client needs. It also means the mobile API puts less effort into backwards compatibility, expecting customers to upgrade their iPad apps ASAP when a new version is out (but maybe they will regret this ... time will show). But you are certainly right about performance - that is one thing which is high priority on that project.

Me, on the other hand, I am trying to create an open API for third party clients to work with - and since there is no specific use case here, well, I end up dumping from one end to another. With a twist though - its not the internal data structures I dump, but a well designed choice of names and structures that I expect to live through multiple internal versions of our data.

I am also adding features for adding, deleting and changing stuff in the system - but these focus on the business operations available (like "add bug report", "attach document", "close a bug report" or "add comment" or "assign responsibility"). So, well, it does have some kind of intend.

What am I am trying to say? That sometime all you have is the intent of, well, dumping the data for others to inspect ...

Just my two cents :-)

/Jørn

mca

unread,
Sep 30, 2012, 3:49:17 PM9/30/12
to api-...@googlegroups.com
Jørn:

good observations regarding mobile, etc. here's the basic guidance that I advocate and teach in workshops, etc.:

1) Take advantage of Separation of Concerns when designing (yes, i used that word) your programming interfaces (APIs).

2) build a solid set of private components (storage, class libraries, business layer, etc.) that knows nothing about connectors (HTTP, XMPP, WebSockets, etc.)

3) when setting out an API, start from the use cases, not the components. What do ppl  (devs, etc.) want to accomplish? what workfow makes sense for these use cases. Keep in mind platform and or device usually represent different use cases even when attempting to complete the same task

4) implement your interface as a thin layer between the private components (DB, etc.) and the public connectors (Web Server, etc.). This is where you "script" your component calls into a useful solution for the targeted use cases.

5) treat representation work (XML, JSON, CSV, HTML, etc) as a separate layer so that future calls for new formats, media types does not disrupt other parts of the system

This can be done very quickly and easily, even on small scales.
a. create a single component to handle a "to-do" list. a class lib, an ORM against a DB, etc.
b. create an api facade that has the routes and use cases all laid out including sorts, filters, actions, etc. here is the "design" part
c. script the api facade against the component (easy at this point)
d. pass the results to the representation layer to output the requested format/media-type (lots of tooling exists for this)
e. rinse & repeat for any other device/use cases you encounter along the way.

good news is this pattern works at the small level and still scales well; even in large organizations.

bad news is that there is no magic here; no silver bullet. it still involves attention to detail, focus on users, not data, and iterating to create great APIs.

Cheers.

mca


Glenn Block

unread,
Sep 30, 2012, 4:13:46 PM9/30/12
to api-...@googlegroups.com
"bad news is that there is no magic here; no silver bullet."

What I want to know is where are the silver bullets? They must exist somewhere. :-)

Jørn Wildt

unread,
Sep 30, 2012, 4:15:40 PM9/30/12
to api-...@googlegroups.com
> What I want to know is where are the silver bullets? They must exist somewhere. :-)

In the works of B. Stoker et. al. :-)

/Jørn

Glenn Block

unread,
Sep 30, 2012, 4:49:44 PM9/30/12
to api-...@googlegroups.com
Nice one :-) 

Brian Topping

unread,
Sep 30, 2012, 5:17:47 PM9/30/12
to api-...@googlegroups.com
Very interesting thread!  This is my first post on this list and with all appropriate gratuity to Brian Mulloy for the introduction.

One architectural analysis that I've found very helpful is to recognize that some coupling is inevitable in complex, evolving systems, and in turn it's quite practical in *closed* systems to decouple internally (at a server-side connector level) instead of at the network level by pairing a versioned client connector on the server side with the version of the client itself.  In that case, the canonicalization of the data happens on entirely on the server, and the client is free (because of the coupling between the connector and the client) to do what it needs to get the job done.  

This is especially helpful in "under-resourced agile environments" (wink, wink) that don't have the benefit of proper initial requirements gathering, yet still want to benefit from modular environments in the long term.  In time, as resource constraints are removed and de facto requirements emerge, a more noble API such as discussed here can be implemented without the front-loaded risk that might be incurred by teams with less aggregate experience (or more burdensome management that radically changes requirements based on the phase of the moon).

Key here is that the server stack is capable of efficient multiplexing of a multitude of connectors.  For instance, if the SDLC that falls out of a particular stack can't also manage the packaging and deployment of these connectors, all bets are off that this will be a reasonable solution.  For instance, I use Java, Maven and OSGi, where OSGi provides the runtime version mechanics and Maven provides the packaging and distribution mechanics.  I don't have enough current knowledge about non-JVM based solutions to speak of how this might be done elsewhere.

Anyway, my point is that organizational sustainability often trumps masterful APIs, but all is not lost for teams that have limitations on getting everything right in the first pass.  

Brian

mca

unread,
Sep 30, 2012, 6:07:54 PM9/30/12
to api-...@googlegroups.com
Brian:

"...it's quite practical in *closed* systems to decouple internally (at a server-side connector level) instead of at the network level by pairing a versioned client connector on the server side with the version of the client itself."

I'd like to hear more about this POV. I'm not sure i can conjure up a tangible example of this; can you elaborate?

Brian Topping

unread,
Sep 30, 2012, 7:20:51 PM9/30/12
to api-...@googlegroups.com
Hi Mike,

Implementation of such an environment moves away from versioning at an overall application level and towards versioning at a domain component level, allowing versions to float between different domain components.  These "domain components" are connectors that are paired for client(s) that speak that API and translate RESTful requests to business logic API presentations that exist.  They provide additional services when necessary such as mediation of mismatched transactional requirements to the internal APIs.

At the REST API level, a container path handler first multiplexes on domain scope, then on version, such as "/userdomain/v5/*".  This reduces complexity in path resolution and allows for multiple concurrent versions of domain components to be resolved cleanly.  So long as a handler for a specific domain and version is available for resolution by the container, a given client can connect (regardless of how many releases an individual connector has participated in).  

At the business logic level, the selected client connector has a selection and/or range of versions of a given API that it can work with.  This binding happens as a part of the packaging lifecycle as provided by the container, assisted by metadata automatically embedded at build-time ("what version range does this package need for each dependency").  It's a lot easier to think of as a dependency graph rather than as multiple bags of APIs.

Once established, the projects evolve with very little maintenance as the build environment and container automatically resolve dependency paths, indicating missing dependency bindings early on component load and unused / orphaned dependencies with simple inspections.  In degenerate cases, many different versions of the same code may be loaded and used by different clients as they have evolved.  But when considered as a component graph, it's not hard to simplify by releasing and deploying a new version of an older connector, maintaining the exposed REST API whilst updating to newer business APIs that a majority of other components in the deployment are already connecting to, then removing the now-orphaned components.

Again, this isn't to diminish the value of well-designed REST APIs.  When such foresight is available from inception, this kind of deployment flexibility gets far less exercise.  But when that's not possible or temporal constraints close in, it's a great tool to have available.  

Brian

mca

unread,
Sep 30, 2012, 7:50:51 PM9/30/12
to api-...@googlegroups.com
"versioning at a domain component level"
not sure i understand yet what a "domain component" is in your model.

""domain components" are connectors that are paired for client(s)"
that confuses me since I understand components and connectors using the language of Taylor[1]. I suspect i am blocked by this mismatch of simple terms, tho.

i suspect you mean that you have versioned the components of the system (supporting side-by-side versions of the same component) and that you publicly expose this version information via URIs (rather than message/media-types i.e "/userdomain/v5/*).

 
"the selected client connector has a selection and/or range of versions of a given API that it can work with. "
I can't come up w/ an example of this. can you help me out? do you have some routing that knows which component works w/ which request (via the URI)? or is some other mechanism in place here?


"the build environment and container automatically resolve dependency paths,"
yep, no clue how that is working. any way you can give me something tangible on this? is this happening behind the public HTTP interface? or at the HTTP routing space?


"Again, this isn't to diminish the value of well-designed REST APIs"
not sure where this is coming from here. you think what you are describing is "not|less RESTful" than some other implementation pattern?

Maybe this is not the right forum for helping me learn the details of what you have going here. maybe there are some papers or documentaiton, etc. that you can point me to.

Thanks.

[1] http://www.isr.uci.edu/architecture/c2.html

Kevin Swiber

unread,
Sep 30, 2012, 8:31:54 PM9/30/12
to api-...@googlegroups.com
So in this scenario, there are multiple versions running on the server. Clients talk to their supported version on the server. The server is free to add new versions. The client can take advantage of a new version after it is upgraded.

Am I understanding this correctly?

Sent from my iPhone

Brian Topping

unread,
Oct 1, 2012, 1:36:43 AM10/1/12
to api-...@googlegroups.com
"versioning at a domain component level"
not sure i understand yet what a "domain component" is in your model. 

A domain such as user management might be one, commerce transactions another, accounting as a third.  These components might be mixed and matched in a deployment, depending on local needs, with primitive types as keys across domains.

""domain components" are connectors that are paired for client(s)"
that confuses me since I understand components and connectors using the language of Taylor[1]. I suspect i am blocked by this mismatch of simple terms, tho. 

Apologies.  That's an interesting article that I'll have to read sometime.  What I did gather from a quick scan sounds a bit like the expectations of dependency injection.  And I see how the terms I was using would be very confusing in that context.

I generally use the language of  GoF[1] or EIP[2], which would consider a "component" to be a generalization of a "connector", which would be an integration endpoint for a specialized resource like a database.  

i suspect you mean that you have versioned the components of the system (supporting side-by-side versions of the same component) and that you publicly expose this version information via URIs (rather than message/media-types i.e "/userdomain/v5/*).

Yes, with the exception of "public" modifier.  I don't believe this is an ideal style of development for public APIs, but more of a means to get an under-resourced organization on a modular development path.  In my experience, one can usually restructure / rewrite a private client API much more easily than they can introduce robust modularity after a system is written without it, hence my personal priority on modularity first.  

Once requirements fall out of such "agile" approaches, public APIs that follow the principles of where this thread started can more easily follow.  

"the selected client connector has a selection and/or range of versions of a given API that it can work with. "
I can't come up w/ an example of this. can you help me out? do you have some routing that knows which component works w/ which request (via the URI)? or is some other mechanism in place here?

I typically use CXF[3].  It provides for declarative annotations that help a runtime introspector to create a syntax tree for URIs.  The HTTP service will select the CXF subsystem by the first element of the URI, then CXF uses the syntax tree to find the resource object that is being referenced, then further select from methods in that object depending on whatever additional parameter declarations are present.  A resource available at /cxf/userdomain/v5/users might have an annotated class as:

@Path("users")
public class UserManager {
    @GET
    public List<User> getAllUsers() {
        // return a list of all users
        ...
    }
    @DELETE
    @Path("{id}")
    public void deleteUser(@PathParam("id")String id) {
        // delete the user with a specified ID
        ...
    }
   ...
}

Thus, a GET method request on /cxf/usermanager/v5/users.xml would return a list of User objects serialized in XML, and /cxf/usermanager/v5/users.json will return the same data serialized in JSON.  /cxf/usermanager/v5/delete/brian with an HTTP method of DELETE would remove my record from the system by selecting that method in the service class and parameterizing the call with an id of 'brian'.

Multiple service classes representing different resources are similarly created.  I typically bundle these together in the concept of a domain, with users, groups and permissions as the three resources in such a domain.  These three resources are versioned independently of other domains, such as commerce or accounting, and releasing a new version of the accounting REST API can be done without impacting other domains such as user management.

"the build environment and container automatically resolve dependency paths,"
yep, no clue how that is working. any way you can give me something tangible on this? is this happening behind the public HTTP interface? or at the HTTP routing space?

This is analogous to dynamic linking whereby a component has a versioned runtime requirement on other components (i.e. an app depending on glibc 3.2.1).  Traditionally, Java runtimes do not provide this kind of versioned API linkage, but OSGi containers can use supplemental metadata so these linkages can be accomplished.  Of course, this kind of resolution has been available for a long time at the operating system executable level, OSGi has one advantage that this kind of resolution is available down to separate call chains within the same executable process.  

"Again, this isn't to diminish the value of well-designed REST APIs"
not sure where this is coming from here. you think what you are describing is "not|less RESTful" than some other implementation pattern? 

I don't understand what you are asking.  I was attempting to eliminate the possible interpretation that use of these facilities might somehow substitute for good API design.  They *do* provide a pragmatic path for certain styles of project development to eventually get to good API design with less total risk, especially when stakeholder communication is poor or the problem being solved is a moving target.  In such cases, "analysis paralysis" can easily set in.  With moving targets, opportunities can be lost; with suboptimal stakeholders, jobs might be lost.  This type of deployment structure allows the can do be kicked down the road somewhat.

Cheers, Brian

Brian Topping

unread,
Oct 1, 2012, 1:49:20 AM10/1/12
to api-...@googlegroups.com

On Sep 30, 2012, at 5:31 PM, Kevin Swiber <ksw...@gmail.com> wrote:

So in this scenario, there are multiple versions running on the server. Clients talk to their supported version on the server. The server is free to add new versions. The client can take advantage of a new version after it is upgraded.

Am I understanding this correctly?

Yes, with the additional nuance that a server can be upgraded without being shut down, providing the capability for these versions to be added concurrently with older versions.

mca

unread,
Oct 1, 2012, 3:34:41 PM10/1/12
to api-...@googlegroups.com
Brian:

what i am getting here is deeper into the weeds w/o more clarity.

so far what i have is:
1) you version components
2) you have clients that call the proper version of the component via URIs.

am i missing anything else of interest here?

Brian Topping

unread,
Oct 1, 2012, 4:15:09 PM10/1/12
to api-...@googlegroups.com
Mike,

It seems that I'm in the same situation.  I presented generalities addressed to nobody in particular in deference to the context of this list and you asked twice in various ways for increasing specifics ("papers or documentaiton, etc").  I did my best to provide them.  Now I'm told I'm leading you into the weeds and you've rather flippantly summarized what I've written down to two sentences.

If I somehow offended you, I apologize, as that certainly was not my intent.  Rather, that intent was to offer experience and different ways of looking at the same problem to those that might be interested in the subject but are otherwise do not post.  If that was inappropriate or outside the mission of this list, maybe someone could point me to guidelines that the list follows.  I certainly would prefer to minimize my investment if my input is not appropriate.  No judgement there, just being realistic.

If you have a genuine interest in what I am doing, let's take it off list so we can work it out.  I'm sure after that, if you still find the information worth inquiring about, your experience with the list would be helpful to formulate it in a manner others could benefit from more easily than I have apparently done so far.

Cheers, Brian

mca

unread,
Oct 1, 2012, 4:20:46 PM10/1/12
to api-...@googlegroups.com
my replies are not meant to be flip, just trying to get at the "nut" of what you are doing. my trip to the weeds is likely self-led, too. i tried to acknowledge that earlier in the thread.

i'd like to explore this more if you have the time and will ping you offline.

Brian Topping

unread,
Oct 1, 2012, 4:26:18 PM10/1/12
to api-...@googlegroups.com
Absolutely open to it and happy to share with anyone interested!

Daniel Crenna

unread,
Oct 2, 2012, 3:43:07 PM10/2/12
to api-...@googlegroups.com
I struggle with two things.

One, is whether pushing the problem of versioning down to the media type is worth it (the risks of being stuck with required fields when you are earlier in development is less than just releasing a completely new endpoint, which happens to end with 'v2'), whether something like message versioning is too much of a burden on infrastructure (adaptive domain modelling becomes another layer of indirection). Steve's book talks about HTML being a successful implementation of message-based versioning. So far, URI versioning is the least painful, and the least prone to paying for mistakes over and over, and it's also the most panned. Over-designing clients to consume new media types gracefully doesn't sound like solving the problem, to me. Flipping to a new URI endpoint is opt-in for the client developer.

Second, there are a lot of folks who are against exposing data in favour of providing behaviour. This sounds right, but then when you look at behaviour like that provided by Twitter, "tweeting" is just an alias to CRUD (POST /status). Fancy query spaces, i.e. /BlueWidgets is just the server doing the RPC for you, i.e. /widgets?colour=blue. Demis Bellot said on Twitter, "Services are about exposing remote functionality, the most reusable form of which is its canonical data". And that resonates with me, because whether we're providing a formatted address because we believe the intent is to put it on a label, that behavior is still representing canonical data, however we want to break it up in the request or formalize it in the response. Right now, I don't see any reason not to expose API-specific models as CRUD, and layer behaviour on top of this as a good starting point. Obviously you don't want to send your first class domain model to the client, but you can create something akin in to "view models", that you map back to your data store.

Daniel
Reply all
Reply to author
Forward
0 new messages