[SC] Usage of "Replace Temp with Query" refactoring

137 views
Skip to first unread message

philip schwarz

unread,
Jul 3, 2011, 9:39:50 AM7/3/11
to software_craftsmanship
Hi all,

do you have any wisdom/experience to share regarding Fowler's "Replace
Temp with Query" refactoring? http://sourcemaking.com/refactoring/replace-temp-with-query.
Although the refactoring is often a vital step before "Extract
Method", that is not its only motivation.

In the first example of refactoring presented in Fowler's
"Refactoring: improving the design of existing code"
http://books.google.com/books/about/Refactoring.html?id=1MsETFPD3I0C
(the "Video Store" example) the "Replace Temp with Query" refactoring
is used no less than three times.

E.g. at some point in the example we have a statement method which has
temporary variable 'totalAmount': https://gist.github.com/1059890.
Fowler applies "Replace Temp with Query" to 'totalAmount' and the
result is this: https://gist.github.com/1059894. He then applies the
same refactoring to temporary variable 'frequentRenterPoints', and we
are left with this: https://gist.github.com/1059899.

In both cases the refactoring was applied not because it was a vital
step before "Extract Method" http://sourcemaking.com/refactoring/extract-method
but just for the benefits it brings. E.g. Fowler says:

###########################################
By replacing the temp with the query method, any method in the class
can get at the information. That helps a lot in coming up with cleaner
code for the class.
###########################################

He also says that one concern with the refactoring we have just seen
lies in performance:

###########################################
The old code executed the "while loop" once, the new code executes it
three times. A while loop that takes a long time might impair
performance. Many programmers would not do this refactoring simply for
this reason. But note the words _if_ and _might_. Until I profile I
cannot tell how much time is needed for the loop to calculate or
whether the loop is called often enough for it to affect the overall
performance of the system. Don't worry about this while refactoring.
When you optimize you will have to worry about it, but you will then
be in a much better position to do something about it, and you will
have more options to optimize efectively...
###########################################

The fact that the refactoring (especially this more radical variant
involving loop duplication) appears three times in this first example
suggests to me that the refactoring is significant, is commonly
applied, and is exemplary about what we do when we refactor, and why
we do it: what we gain by doing it. But is it? What do you normally
do? Do you always or typically perform this type of refactoring? Do
you typically wait for 'the first bullet'? Do you avoid this
refactoring as much as possible? Do you face objections from greener
programmers?

In "One Thing: Extract till you Drop"
http://blog.objectmentor.com/articles/2009/09/11/one-thing-extract-till-you-drop,
Unce Bob says (about another refactoring: Extract Method):

###########################################
Perhaps you think this is taking things too far. I used to think so
too. But after programming for over 40+ years, I’m beginning to come
to the conclusion that this level of extraction is not taking things
too far at all. In fact, to me, it looks just about right.
###########################################

Do you think the example of "Replace Temp with Query" at hand is
taking it too far? Do you think it would only happen when doing
'Refactor till you Drop'? Recently I watched Uncle Bob's "Clean Code
Episode III - Screencast 3: Video Store"
http://www.cleancoders.com/codecast/clean-code-episode-3-screencast3-videostore/show,
in which he shows us his own refactoring of the video store example.
He does not apply "Replace Temp with Query" to 'totalAmount' and
'totalFrequentrenterPoints', but that seems to be because he goes down
another route in which he makes the two variables member fields, and
their usage ends up in a very short helper method, in which their
presence is less conpicuous and their updating does not seem to make
the code unnecessarily harder to read/understand.

In Corey Haines' 'Taking it too far' v 'Taking too much time'
http://programmingtour.blogspot.com/2010/12/taking-it-too-far-v-taking-too-much.html
he says:

###########################################
As I've traveled around and had the opportunity to work with people of
widely varying experience and skill level, I've noticed that the big
difference is which corners get cut. For people with less experience,
say 1-5 years, the 'get it done' code is significantly different, less
clean, than the code written by those with a lot more experience, say
15-20 years, yet written in the same amount of time. This makes sense
and isn't a knock against beginners, it is just a good indication that
there is always more to learn. The gap between what we actually write
and what we think we would write given enough time is always there,
just at a different level. As we gain more experience, that gap simply
moves upward, so the code is cleaner given the same constraints.
###########################################

Do you, or people you have worked with object to "Replace Temp with
Query" because it is 'taking it too far', or because 'it takes too
much time'?

Taking Ron Jeffries' "Clean Code: Too Much of a Good Thing?"
http://xprogramming.com/articles/too-much-of-a-good-thing/ into
account: do you think we are we cutting corners when we don't apply
"Replace Temp with Query"?

At which of the Dreyfus levels do you think developers use "Replace
Temp with Query":

* Novice Programmer
* Advanced Beginner
* Competent
* Proficient
* Expert

Thanks,

Philip

Carlo Pescio

unread,
Jul 3, 2011, 2:46:20 PM7/3/11
to software_cr...@googlegroups.com
Replace temp with query is a way to decompose code. Decomposition is not inherently good or evil. It's good when it gives birth to new, useful abstractions. It's evil when it's done for its own sake, creating lots of tiny, tangled methods that we need to follow to understand what is actually going on. 

We should also look for different opportunities to create useful abstractions, and clean up our code, without resorting to needless decomposition.

I'll use an example from Corey Haines' code. He replaces

def will_die_with?(number_of_neighbors)
  (number_of_neighbors < 2) || (number_of_neighbors > 3)
end

with

def will_die_with?(number_of_neighbors)
  underpopulated_with?(number_of_neighbors) || overpopulated_with?(number_of_neighbors)
end

def underpopulated_with?(number_of_neighbors)
  number_of_neighbors < 2
end

def overpopulated_with?(number_of_neighbors)
  number_of_neighbors > 3
end

Which is more verbose and not necessarily easier to read. 
I would have taken the opportunity to create two constants, which are really part of the problem domain and as such deserve a meaningful name, so that I could write:

def will_die_with?(number_of_neighbors)
  (number_of_neighbors < MinimumPopulation) || 
  (number_of_neighbors > MaximumPopulation)
end

Do I need to replace a temp with query here? I actually don't think so, and would rather have to read this code than the more verbose version, or the one with naked literals.

Along the same lines (decompose to useful abstractions, not for its own sake) functional decomposition is not necessarily the best decomposition. 
Looking at Martin's code ("extract till you drop") he ends up with 7 private methods. Of those, 5 take symbolName as a parameter. Do I smell "feature envy" here? :-). Do we need a TranslatedSymbol class? Should nextSymbol automatically skip over already returned symbols? Are we blinded by trees of small functions, or by rote repetition of a technique?

It's weird to see experienced programmers pushing a single technique to the limit, when we should be aware that good design stems from balance and from using a variety of techniques.
Usage of RTWQ can happen at any level beyond the Novice Programmer; abusing RTWQ, however, should happen below the Expert level :-)

Carlo




Ted M. Young [@jitterted]

unread,
Jul 3, 2011, 3:40:29 PM7/3/11
to software_cr...@googlegroups.com
Hi Philip,

I've found that I don't think about refactorings in this way anymore, i.e., I don't think about "let me get rid of this temp variable by using a query", I think "I don't like this loop here, I'll extract it into a method". However, I do end up extracting these types of calculations into methods in order to (primarily) make the method easier to read, and (secondarily) to reduce duplication.

Fowler's point about premature optimization (i.e., worrying about looping over the same data over and over) is correct, but you also have to know how your application is used in terms of volumes of data. This would be fine if there were dozens or even hundreds of rentals, but what if you knew there were going to be accounts with 100,000 rentals? In that case, it'd be silly to ignore that information, so you might instead refactor towards SQL sum queries -- it's still a query, just a different kind. But then you'd have two very similar SQL queries (one summing over the charge amounts, another over the rental points), which might be faster as a single query, but might not. At that point, I'll leave it alone as two well-named methods and optimize later if I find that the two queries are taking too long separately.

As for cutting corners, if a developer isn't constantly trying to make methods as short as possible, while also reducing duplication, they're not doing as much as they could. As Corey has noted, less-experienced developers won't do this as much (take it as far) as more experienced developers, and will take longer doing it. This is why pairing is so helpful: even inexperienced developers can see the loops and long methods and (often) duplication and work with the other developer to eliminate them.

Finally, Corey's note on abstractions is key: the more comfortable you are with abstractions, the more likely you are to factor out methods appropriately and see duplication. I've seen lots of refactorings that are technically correct, but slice the code in a way that leaves you with just a bunch of methods instead of a cohesive set of abstractions that match the problem. I've seen these slices cause an increase in subtle duplication because other developers don't see those methods as doing what they need to do, so they write new methods. I'll have to find some examples of this, because it's a more insidious problem in that it can make the code appear clean, when in fact there's duplication all over the place.

;ted


--
You received this message because you are subscribed to the Google Groups "software_craftsmanship" group.
To post to this group, send email to software_cr...@googlegroups.com.
To unsubscribe from this group, send email to software_craftsma...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/software_craftsmanship?hl=en.


J. B. Rainsberger

unread,
Jul 4, 2011, 6:59:09 AM7/4/11
to software_cr...@googlegroups.com
On Sun, Jul 3, 2011 at 16:40, Ted M. Young [@jitterted]
<tedy...@gmail.com> wrote:

> Finally, Corey's note on abstractions is key: the more comfortable you are
> with abstractions, the more likely you are to factor out methods
> appropriately and see duplication. I've seen lots of refactorings that are
> technically correct, but slice the code in a way that leaves you with just a
> bunch of methods instead of a cohesive set of abstractions that match the
> problem. I've seen these slices cause an increase in subtle duplication
> because other developers don't see those methods as doing what they need to
> do, so they write new methods. I'll have to find some examples of this,
> because it's a more insidious problem in that it can make the code appear
> clean, when in fact there's duplication all over the place.

…which is why I focus so much on good names. Even if you have a bunch
of methods, if you name them accurately and precisely, then you'll see
duplication again, and your compulsion to remove that duplication will
cause you to increase cohesion.

Hm: removing structural duplication leads to low coupling; removing
naming duplication leads to high cohesion. I think I'll use that.
--
J. B. (Joe) Rainsberger :: http://www.jbrains.ca ::
http://blog.thecodewhisperer.com
Author, JUnit Recipes
Free Your Mind to Do Great Work :: http://www.freeyourmind-dogreatwork.com

philip schwarz

unread,
Jul 4, 2011, 6:52:55 PM7/4/11
to software_craftsmanship
Hi Carlo,

thanks a lot for your input.

You said:

> I'll use an example from Corey Haines' code. He replaces
>
> def will_die_with?(number_of_neighbors)
>   (number_of_neighbors < 2) || (number_of_neighbors > 3)
> end
>
> with
>
> def will_die_with?(number_of_neighbors)
>   underpopulated_with?(number_of_neighbors) ||
> overpopulated_with?(number_of_neighbors)
> end
>
> def underpopulated_with?(number_of_neighbors)
>   number_of_neighbors < 2
> end
>
> def overpopulated_with?(number_of_neighbors)
>   number_of_neighbors > 3
> end
>
> Which is more verbose and not necessarily easier to read.

My guess is that he found it easier to read, and that he also reckoned
that the majority of potential readers would find it easier to read.

I find it easier to read. Don't you think that "query1(variable) ||
query2(variable)" has a lower cognitive load than "(variable infix-
operator-1 constant1) || (variable infix-operator-2 constant2)"? Isn't
the former easier to read/understand in the same way that it is easier
to read/understand when a newspaper says overpopulated rather than
'the population is too large', or 'the population is larger than X'.
Doesn't extracting (number_of_neighbors < 2) and (number_of_neighbors
> 3) introduce the following useful abstractions:

overpopulated_with? = lambda(x)(x > 3)
underpopulated_with? = lambda(x)(x < 2)

Taking this further, how about introducing another level of
abstraction?:

def will_die_with?(number_of_neighbors)
unsustainably_populated_with?(number_of_neighbors)
end

def unsustainably_populated_with?(number_of_neighbors)
underpopulated_with?(number_of_neighbors) ||
overpopulated_with?(number_of_neighbors)
end

Philip

On Jul 3, 7:46 pm, Carlo Pescio <carlo.pes...@gmail.com> wrote:
> Replace temp with query is a way to decompose code. Decomposition is not
> inherently good or evil. It's good when it gives birth to new, useful
> abstractions. It's evil when it's done for its own sake, creating lots of
> tiny, tangled methods that we *need *to follow to understand what is
> sake) *functional *decomposition is not necessarily the best decomposition.
> Looking at Martin's code ("extract till you drop") he ends up with 7 private
> methods. Of those, 5 take symbolName as a parameter. Do I smell "feature
> envy" here? :-). Do we need a *TranslatedSymbol *class? Should *nextSymbol *automatically

J. B. Rainsberger

unread,
Jul 4, 2011, 7:01:49 PM7/4/11
to software_cr...@googlegroups.com
On Mon, Jul 4, 2011 at 19:52, philip schwarz
<philip.joh...@googlemail.com> wrote:
> def will_die_with?(number_of_neighbors)
>    unsustainably_populated_with?(number_of_neighbors)
> end
>
> def unsustainably_populated_with?(number_of_neighbors)
>    underpopulated_with?(number_of_neighbors) ||
>    overpopulated_with?(number_of_neighbors)
> end

class LocalPopulation
def sustainable?

end
end

philip schwarz

unread,
Jul 4, 2011, 7:43:04 PM7/4/11
to software_craftsmanship
neat.

On Jul 5, 12:01 am, "J. B. Rainsberger" <m...@jbrains.ca> wrote:
> On Mon, Jul 4, 2011 at 19:52, philip schwarz
>

Carlo Pescio

unread,
Jul 5, 2011, 4:06:07 AM7/5/11
to software_cr...@googlegroups.com
def will_die_with?(number_of_neighbors) 
    unsustainably_populated_with?(number_of_neighbors) 
end 

def unsustainably_populated_with?(number_of_neighbors) 
    underpopulated_with?(number_of_neighbors) || 
    overpopulated_with?(number_of_neighbors) 
end 
------------
I'll start with this. In one of the comments in Martin's post, it was noted that if you decompose a function, you should actually decompose :-) the function. I tend to agree. If you just do a forwarding, why not use the proper name in the first place?

About the cognitive load, I would be wary of saying that without some serious experiments + cognitive model. Generally speaking, I find
( x + y ) * ( x - y )
far more readable than
TakeTheSumOf( x, y ) * TakeTheDifferenceOf( x, y )
or
Multiply( Sum( x, y ), Difference( x, y ) )
and I guess I'm not the only one around.

But again, we're talking about mental models without a real experimental setting to back this stuff up, and I find this a little uncomfortable, I don't want to fall into pseudo-science. I can easily say what I consider clean. Extending the reasoning to "the majority" of programmers requires experiments, classification, etc.

Borrowing from the extensive research in program understanding, there is a well-known notion of "delocalized plan" (see "Delocalized plans and program comprehension", IEEE Software, vol 3, 1986, yeap, it's that old :-). A plan is an intention. For instance, in Martin's code there is a plan of using a symbol cache to avoid repetitive substitutions (which is probably part of an unspoken plan to avoid an endless substitution loop). 
That plan, according to Soloway, is delocalized / scattered in the final version of Martin's code (between shouldReplaceSymbol and replaceSymbol), making that code harder to read. 
Interestingly, in one of the less decomposed versions, the plan was entirely present within replaceAllInstances. When I said that we should decompose in two classes, TranslatedSymbol would ignore the cache problem, and the outer class would deal with multiple substitutions, keeping the plan together.

The notion of "plan as an intention" may also guide toward a different decomposition for will_die_with. The "real plan" in will_die_with was to check a range. Perhaps changing the name to WillSurvive and implementing that as 

number_of_neighbors.Within( MinimumPopulation, MaximumPopulation)

would provide the best value (also from a reusability perspective, as Within can be reused elsewhere).

Generally speaking, the notion of "delocalized plan" was meant to capture the feeling / opinion, expressed by several comments in Martin's post, that decomposing code may make program understanding harder. Reading code (often for maintenance purpose) requires a level of understanding far beyond newspaper reading. There is a large body of knowledge on the psychology of programming that we should consider before proposing to decompose ad libitum. A sensible guideline, in my opinion, would be to decompose (and recompose!) code to keep all parts of a plan "near". Other clues (like using properly named constants instead of literals) may then add some readability without adding just another function call. 
However, as I mentioned balance in my previous post, it should also be noted that some valuable techniques / patterns (e.g. Template Method) notoriously introduce a delocalized plan, but that doesn't make them any less useful. Balance is the key: I may trade a delocalized plan for extendibility. I won't introduce a delocalized plan just becase I can (which is what I'll end up doing by decomposing until I drop).

Interestingly, there is also some (disturbing) experimental evidence that small functions are proportionally more error-prone than long functions. While I think (and actually hope, because I'm usually going for short functions :-) that there is an experimental bias in that work, here is a reference: "An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules", IEEE Transactions on Software Engineering, March/April 2009 (it's not the old Goldilock stuff that was proven mathematically wrong long ago).

Carlo

PS: my previous post didn't appear, if not as a quote in yours. Perhaps it was never approved?

philip schwarz

unread,
Jul 5, 2011, 2:11:17 PM7/5/11
to software_craftsmanship
Hi Carlo,

thanks for your feedback.

> def will_die_with?(number_of_neighbors)
>     unsustainably_populated_with?(number_of_neighbors)
> end
>
> def unsustainably_populated_with?(number_of_neighbors)
>     underpopulated_with?(number_of_neighbors) ||
>     overpopulated_with?(number_of_neighbors)
> end
> ------------
> I'll start with this. In one of the comments in Martin's post, it was noted
> that if you decompose a function, you should actually decompose :-) the
> function. I tend to agree. If you just do a forwarding, why not use the
> proper name in the first place?

It seems to me I might want to do that kind of forwarding if I want to
separate the concern of establishing whether something will happen,
e.g. by calling will_die_with?(x) from the concern of how to establish
whether it will happen. e.g. by calling unsustainably_populated_with?
(x). I did not consider questioning Corey's decision to write a
will_die_with?(x) method: I simply explored introducing an extra
level of indirection to separate two concerns. The pay-off of the
extra indirection is the separation of concerns, which may or may not
be judged a sufficient reward.

Philip
> is an *intention*. For instance, in Martin's code there is a plan of using a
> symbol cache to avoid repetitive substitutions (which is probably part of an
> unspoken plan to avoid an endless substitution loop).
> That plan, according to Soloway, is delocalized / scattered in the final
> version of Martin's code (between shouldReplaceSymbol and replaceSymbol),
> making that code harder to read.
> Interestingly, in one of the less decomposed versions, the plan was entirely
> present within replaceAllInstances. When I said that we should decompose in
> two classes, TranslatedSymbol would ignore the cache problem, and the outer
> class would deal with multiple substitutions, keeping the plan together.
>
> The notion of "plan as an intention" may also guide toward a different
> decomposition for will_die_with. The "real plan" in will_die_with was to
> check a range. Perhaps changing the name to WillSurvive and implementing
> that as
>
> number_of_neighbors.Within( MinimumPopulation, MaximumPopulation)
>
> would provide the best value (also from a reusability perspective, as Within
> can be reused elsewhere).
>
> Generally speaking, the notion of "delocalized plan" was meant to capture
> the feeling / opinion, expressed by several comments in Martin's post, that
> decomposing code may make program understanding harder. Reading code (often
> for maintenance purpose) requires a level of understanding far beyond
> newspaper reading. There is a large body of knowledge on the psychology of
> programming that we should consider before proposing to decompose *ad
> libitum*. A sensible guideline, in my opinion, would be to decompose (and
> recompose!) code to keep all parts of a plan "near". Other clues (like using
> properly named constants instead of literals) may then add some readability
> without adding just another function call.
> However, as I mentioned balance in my previous post, it should also be noted
> that some valuable techniques / patterns (e.g. Template Method) notoriously
> introduce a delocalized plan, but that doesn't make them any less useful.
> Balance is the key: I may trade a delocalized plan for extendibility. I
> won't introduce a delocalized plan* just becase* *I can* (which is what I'll

Michael Feathers

unread,
Jul 7, 2011, 1:03:44 PM7/7/11
to software_cr...@googlegroups.com
I think that in Fowler's example, there is a little bit of
mis-direction. He talked about 'Replace Temp with Query' (RTQ) as a
general strategy for cleaning up long methods, but when I look at that
example, I know my motivation would be responsibility separation.

I don't have any issue with RTQ. To me, it is perfectly fine as a
strategy on on long methods. We can always inline them if we find
that we do not like where we've gone. In Dreyfus terms, what I do
like about pushing that example is that it can get newbies into the
state of being aware that mutation of temporaries often prevents clean
Extract Method, and that making parts of code immutable can often be
as clarifying in much the same way as improving the name.

Michael

Ralf Westphal

unread,
Jul 8, 2011, 5:33:46 AM7/8/11
to software_craftsmanship
I agree with Michael. Especially the video store example code
(statement method) is a mixture of different responsibilities.
And such a mixture, I´d say, is the result of not thinking in terms of
"action sequence": What are the distinct actions that make up "create
rental statement"? How can those actions be arranged in a manner to
easily understand them?

For example

Calculation
*Calculate item prices
*Calculate frequent renter points
*Calculate totals

Initialize_statement
*List items
*List frequent renter points
*List totals

Sure, the creator of the code knew all this needed to be done -
implicitly at least. But firstly he did not use language features to
express the structure, and secondly he intermixed the actions.

On 7 Jul., 19:03, Michael Feathers <michael.feath...@gmail.com> wrote:
> I think that in Fowler's example, there is a little bit of
> mis-direction.  He talked about 'Replace Temp with Query' (RTQ) as a
> general strategy for cleaning up long methods, but when I look at that
> example, I know my motivation would be responsibility separation.
>
> I don't have any issue with RTQ.  To me, it is perfectly fine as a
> strategy on on long methods.  We can always inline them if we find
> that we do not like where we've gone.   In Dreyfus terms, what I do
> like about pushing that example is that it can get newbies into the
> state of being aware that mutation of temporaries often prevents clean
> Extract Method, and that making parts of code immutable can often be
> as clarifying in much the same way as improving the name.
>
> Michael
>
> On Sun, Jul 3, 2011 at 8:39 AM, philip schwarz
>
>
>
>
>
>
>
> <philip.johann.schw...@googlemail.com> wrote:
> > Hi all,
>
> > do you have any wisdom/experience to share regarding Fowler's "Replace
> > Temp with Query" refactoring?http://sourcemaking.com/refactoring/replace-temp-with-query.
> >http://blog.objectmentor.com/articles/2009/09/11/one-thing-extract-ti...,
> > Unce Bob says (about another refactoring: Extract Method):
>
> > ###########################################
> > Perhaps you think this is taking things too far. I used to think so
> > too. But after programming for over 40+ years, I’m beginning to come
> > to the conclusion that this level of extraction is not taking things
> > too far at all. In fact, to me, it looks just about right.
> > ###########################################
>
> > Do you think the example of "Replace Temp with Query" at hand is
> > taking it too far? Do you think it would only happen when doing
> > 'Refactor till you Drop'? Recently I watched Uncle Bob's "Clean Code
> > Episode III - Screencast 3: Video Store"
> >http://www.cleancoders.com/codecast/clean-code-episode-3-screencast3-...,
> > in which he shows us his own refactoring of the video store example.
> > He does not apply "Replace Temp with Query" to 'totalAmount' and
> > 'totalFrequentrenterPoints', but that seems to be because he goes down
> > another route in which he makes the two variables member fields, and
> > their usage ends up in a very short helper method, in which their
> > presence is less conpicuous and their updating does not seem to make
> > the code unnecessarily harder to read/understand.
>
> > In Corey Haines' 'Taking it too far' v 'Taking too much time'
> >http://programmingtour.blogspot.com/2010/12/taking-it-too-far-v-takin...
Reply all
Reply to author
Forward
0 new messages