How to refactor data safely?

Skip to first unread message

Jakub Holy

May 22, 2014, 4:17:52 AM5/22/14
I have a nested data structure, used by a bunch of functions that presume knowledge of its structure, and I wonder how to change a part of the structure in a safe way, preferably in small incremental steps, rather than having my code broken until I update all the functions and tests for the new structure. I believe many of you must have experiences with this, would you care to share some tips?

The data structure is first built incrementally and the collected data is later summarized. Instead of replacing the raw data with their summary, I want to keep both, so I want to wrap the data with a map; i.e. from:
    { <id> [ data...] }   ;; later replaced with {<id> summary}
    {<id> {:data [data...], :summary ...}

I have a number of functions operating on the structure and tests for those functions (with test data that also need to be updated w.r.t. the refactoring).

When I change one of the functions to produce the new data structure (i.e. data wrapped in a map instead of the data itself), everything else breaks. So I fix some tests and another function and get even more failures. This does not feel as a good way to do it as I prefer to have limited red and am fond of parallel change for that reason.

Ideally, I would have an automated refactoring or the possibility to wrap the data in some kind of a two-faced proxy that could behave both as a vector (towards the old code) or as a map containing the vector (towards the updated code) [some thing like lenses/cursor?!]. I haven't either so I guess the only option remaining is a well-controlled process of updating the structure and code. Any advice?

Thank you! /Jakub
Forget software. Strive to make an impact, deliver a valuable change.

Vær så snill og hjelp meg med å forbedre norsken min – skriftlig og muntlig. Takk!)

Jakub Holy
Solutions Engineer | +47 966 23 666
Iterate AS |
The Lean Software Development Consultancy
- -


May 22, 2014, 4:21:30 AM5/22/14
How are you accessing the data?

I suppose that if you were accessing (maybe you are) the data via helper functions, that's where most of the refactoring should happen.

You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit

Bertrand Dechoux

May 23, 2014, 11:43:08 AM5/23/14
It's only a reformulation of Ulises comment but I would say:

1) abstract away how the data is accessed
2) introduce change one function at a time by swapping the past accessor with the new accessor (really it is like a getter/setter)
3) if you have final consumer(s) you might need to introduce a datastructure (a map?) stating which id should be accessed with the new implementation of the accessor


Armando Blancas

May 23, 2014, 5:52:04 PM5/23/14

I'll be interested to learn how you work this out. I also work with data whose structure is known to functions in various modules, thus its very shape is a contract. This is coming from the other end of encapsulating everything in Java classes and interfaces. Also, I write test cases at a high level and not really as unit tests, which prevents rewriting test after a refactoring but will like to know how you handle that too so as to reduce any rework there or else whether it's worth the maintenance. 

Short of a massive refactoring of data and code, maybe writing data-transform function? Not sure about the proxy concept (is that data?) but if a function can produce the new format from the old you may start changing one consumer function at a time; then work on the producers until you can switch and remove the transform.

Laurent PETIT

May 24, 2014, 4:19:31 AM5/24/14
So you want a transition phase where existing consuming code can work with both data shapes, and then start fixing functions one at a time to use the new data shape, until you reach the point where you have only the new data shape produced / consumed in your application?

Current consumer functions expect the data to behave as a vector: calling seq on it should produce an deterministically ordered list of property values, calling nth should work, etc.

Upgraded consumer functions will expect the data to behave as a map, so be able to pick keyword keys.

You could implement a transitional protocol which would behave differently depending on how data is accessed (then delegating to the underlying fields).

You would probably have to make some compromises during this transition phase. E.G. refactored consumer functions would not be able to call (seq) on the map (or they would get the data vector back).


-- Laurent


Jakub Holy

May 27, 2014, 5:07:11 AM5/27/14
Hello, thank you all for your tips! I'm away from my PC for a month so I will experiment with your proposals when I come back.

@Ulises, @Betrand: Hiding the access behind functions to limit the change to them is something I have also though of. The disadvantage is that I loose the benefit of destucturing and more documentary function signatures. But I want to try this out to see how it turns out to be,

@Aramndo: Thank you, that is a neat idea to use a temporary transformation fn that can produce the old structure from the new so that I could propagate the change slowly, one step at a time. I will look more into that. And if I am lucky with time, I will share my experience via a blog post.

@Laurent: I want a save way to change the data structure - this temporary ability to work with both shapes of data is one possible way of achieving it. Otherwise you captured it quite well. "You could implement a transitional protocol which would behave differently depending on how data is accessed" - yes, that is what I have been thinking about. I have never done such a thing but I guess it should not be too difficult. You have a good point with the necessity of compromises. I guess I can draw inspiration from Om's cursors that behave as data but are more than that, knowing their place in the larger structure and producing new cursors when accessed - they too have limits such as not working with lazy seqs (so f.ex. (remove ..) must be wrapped in (vec ...)) or sets.

I hope I will eventually get back with a report how the different approaches worked. More ideas are still welcome :-)
Reply all
Reply to author
0 new messages