Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A design problem I met again and again.

3 views
Skip to first unread message

一首诗

unread,
Apr 1, 2009, 3:44:01 AM4/1/09
to
Hi all,

I am a programmer who works with some different kinds of programming
languages, like python, C++(in COM), action script, C#, etc.

Today, I realized that, what ever language I use, I always meet a same
problem and I think I never solve it very well.

The problem is : how to break my app into functional pieces?

I know it's important to break an application to lots of pieces to
make it flexible. But it's easier said than done. I can split an
application to 4 or 5 pieces based on "programming functions", for
example, logging, socket, string, math, ...

When it comes to the business logic, I found I always provide a big
class with many methods, and it grow bigger when new functions are
added.

Recently I use twisted to write a server. It has several protocol
classes which decode and encode different kinds of network protocols ,
and a protocol independent service class which handle request from
clients according to business logic.

Protocol classes receive message from client, decode it, call method
of service, encode result and send it back to client.

There are also some utility packages such as logging as I mentioned
before.

So far so fine, every thing is clear.

Until one day I find service has nearly 100 methods and 6000 lines of
code. I don't need to read any programming book to know that it's
too big.

But I can not find an easier way to split it. Here are some
solutions I found:

1. add several business classes, and move code in service into them.
But this means although service will contains much less code, it still
has to keep lots of methods, and the only functions of these methods
is call corresponding methods in business classes. The number of
methods in service will keep growing for ever.

2. completely move codes in service to business classes containing
only classmethods. These protocol classes calls these classmethods
directly instead of call service. But this pattern doesn't look that
OO.

3. completely move codes in service to business classes. Initialize
these classes and pass them to protocol classes.
These protocol classes calls these instances of business classes
instead of call service. These means whenever I add a new business
class. I have to add a parameter to __init__ methods of every
protocol class. Not very clear either.

==========================================

I got the same problem when writing C#/C++ when I have to provide a
lot of method to my code's user. So I create a big class as the entry
point of my code. Although these big classes doesn't contains much
logic, they do grow bigger and bigger.

Lawrence D'Oliveiro

unread,
Apr 1, 2009, 4:55:23 AM4/1/09
to
In message <48506803-a6b9-432b-acef-
b75f76...@v23g2000pro.googlegroups.com>, 一首诗 wrote:

> Until one day I find service has nearly 100 methods and 6000 lines of
> code. I don't need to read any programming book to know that it's
> too big.

The question is not how many lines or how many methods, but whether it makes
sense to remain as one piece or not. In one previous project, I had one
source file with nearly 15,000 lines in it. Did it make sense to split that
up? Not really.

andrew cooke

unread,
Apr 1, 2009, 6:40:02 AM4/1/09
to newp...@gmail.com, pytho...@python.org
一首诗 wrote:
> 3. completely move codes in service to business classes. Initialize
> these classes and pass them to protocol classes.
> These protocol classes calls these instances of business classes
> instead of call service. These means whenever I add a new business
> class. I have to add a parameter to __init__ methods of every
> protocol class. Not very clear either.

i don't fully understand your problem, but i would guess (3) is the
correct solution. you can probably avoid adding a new parameter by
writing code in a generic way (using lists of arguments, perhaps using
introspection to find method names, etc)

andrew


一首诗

unread,
Apr 1, 2009, 10:38:43 AM4/1/09
to
I also think that's my best choice. Before I wrote my mail, I
already knew that this is not a good question. It lacks details, and
it is too big.

But I think the first step to resolve a problem is to describe it. In
that way, I might find the answer myself

On Apr 1, 6:40 pm, "andrew cooke" <and...@acooke.org> wrote:

一首诗

unread,
Apr 1, 2009, 10:40:40 AM4/1/09
to
On Apr 1, 4:55 pm, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealand> wrote:
> In message <48506803-a6b9-432b-acef-

What are the average size of source files in your project? If it's
far lower than 15,000, don't feel it's a little unbalance?

Nick Craig-Wood

unread,
Apr 1, 2009, 2:30:05 PM4/1/09
to
一首诗 <newp...@gmail.com> wrote:
> But I think the first step to resolve a problem is to describe it. In
> that way, I might find the answer myself

:-) That is a great saying!

To answer your original question, split your code up into sections
that can be tested independently. If you can test code in a isolated
way then it belongs in a class / module of its own.

If you have a class that is too big, then factor independent classes
out of it until it is the right size. That is easier said than done
and may require some creativity on your part. It will pay dividends
though as the level of abstraction in your program will rise.

I've noticed some programmers think in big classes and some think in
small classes. Train yourself to do the other thing and your
programming will improve greatly!

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Martin P. Hellwig

unread,
Apr 1, 2009, 3:14:59 PM4/1/09
to
一首诗 wrote:
<cut>

> But I think the first step to resolve a problem is to describe it. In
> that way, I might find the answer myself
<cut>
That is an excellent approach, knowing you have a problem and describing
it is actually the hardest part of a design, the rest is more like a puzzle.

What I guess so far is that you tried to (re)design your work by
grouping on functionality and using classes for more clearer work.
From what you wrote (that is if I understood you correctly), both of
these approaches don't really seem to get 'there'.

It might be worth to try another approach, instead of focussing on the
characteristics of the functions and using them as a guideline for your
design you could try this:

Step 1:
Write a Functional Design from a user perspective, restrain yourself
from implying anything technical or choosing specific tools. Imagine
yourself as an end-user and not as a developer.

Pick a random person of the street that looks literate but is not
working in IT (secretaries are usually great for this!), let them
comment on your language and then quiz them about the content to see if
they actually understood what you wrote.

If commenting on language seems strange, in my experience if I can't
properly describe what I want to say then there is a good chance that I
haven't thought about it sufficiently or I was lazy in describing it.


Step 2:
Take this functional design and write a functional specification.
This is much like the design but instead focusses on the business
processes and interdependencies of these. Write out implied constraints
and things you might think is obvious, although the specification are
technical in nature you should still avoid naming specific tools unless
it is to describe functionality, i.e. google like approach of indexing
data. Use plain English (or whatever language you want to write it in)
for this, don't use any diagrams, SQL table layouts, UML etc.

Pick a random IT related colleague (network administrators are usually
my preferred choice), let them read it and quiz them to make sure the
specification are clear enough.


Step 3:
When you have your functional specification, write a technical design.
Here you make a choice on the tools you are going to use based on
evidence based research and describe the general outline of your solution.

Pour your co-worker a nice cup of beverage of their choice and let them
read it and of course quiz them.

Step 4:
Finally, use the technical design for writing a technical specification.
Design you program using UML (or whatever thing that makes you look like
you are developing without writing code). Specify deep, down to the name
of all 'public' functions.

Step 5:
Let it rest for the weekend.

Step 6:
Reread your technical specification, if it still makes sense, continue.
If it doesn't, go back to step 1 and repeat the process with the changes
you made.

Step 7:
Do what you usually do (I write my unit-tests first and then solve them).

Step 8:
Compare the end product with your original functional design.
If they do not align go back to Step 1.


Some hints I found useful during step 4. I try to take in account that
it is not me who is going to develop it but a team of reasonable
qualified developers. Thus I split up the work in parts that can be
simultaneously done by more then one person without them needing to know
exactly what the other one is doing. If there is a need to know what the
other developer is doing then the specification was not precise enough.

If during the whole process something comes up that shows a better way,
change your documentation accordingly.


When all of this still results in an 'ugly' design, try letting more
people read your documentation, if that doesn't help then one or more of
the following may apply:
- Despite of its ugliness it is the most elegant design possible.
- You are working on something that is fundamentally broken.
- You haven't met the person that can give you more insight.

YMMV
--
mph

Carl Banks

unread,
Apr 1, 2009, 5:58:55 PM4/1/09
to
On Apr 1, 12:44 am, 一首诗 <newpt...@gmail.com> wrote:
> I got the same problem when writing C#/C++ when I have to provide a
> lot of method to my code's user. So I create a big class as the entry
> point of my code. Although these big classes doesn't contains much
> logic, they do grow bigger and bigger.


This seems to be a classic result of "code-based organization", that
is, you are organizing your code according to how your functions are
used. That's appropriate sometimes. Procedural libraries are often
organized by grouping functions according to use. The os module is a
good example.

However, it's usually much better to organize code according to what
data it acts upon: "data-based organization". In other words, go
though your big class and figure out what data belongs together
conceptually, make a class for each conceptual set of data, then
assign methods to classes based on what data the methods act upon.

Consider the os module again. It's a big collection of functions, but
there are a group of functions is os that all act on a particular
piece of data, namely a file descriptor. This suggests tha all the
functions that act upon file descriptors (os.open, os.close, os.seek,
etc.) could instead be methods of a single class, with the file
descriptor as a class member.

(Note: the os library doesn't do that because functions like os.open
are supposed to represent low-level operations corresponding to the
underlying system calls, but never mind that. Ordinarily a bunch of
functions operating on common data should be organized as a class.)


Carl Banks

Lawrence D'Oliveiro

unread,
Apr 2, 2009, 1:47:29 AM4/2/09
to
In message <158986a9-b2d2-413e-9ca0-
c58429...@f1g2000prb.googlegroups.com>, 一首诗 wrote:

Why?

Steven D'Aprano

unread,
Apr 2, 2009, 2:28:04 AM4/2/09
to
On Thu, 02 Apr 2009 18:47:29 +1300, Lawrence D'Oliveiro wrote:

>>> The question is not how many lines or how many methods, but whether it
>>> makes sense to remain as one piece or not. In one previous project, I
>>> had one source file with nearly 15,000 lines in it. Did it make sense
>>> to split that up? Not really.
>>
>> What are the average size of source files in your project? If it's
>> far lower than 15,000, don't feel it's a little unbalance?
>
> Why?

If you have too much code in one file, it will upset the balance of the
spinning hard drive platter, and it will start to wobble and maybe even
cause a head-crash.

--
Steven

Martin P. Hellwig

unread,
Apr 2, 2009, 4:50:33 AM4/2/09
to
Steven D'Aprano wrote:
<cut>

> If you have too much code in one file, it will upset the balance of the
> spinning hard drive platter, and it will start to wobble and maybe even
> cause a head-crash.
>
That is why proper designed operating systems, like windows 95,rarely
write one continuous block but spread the file all over the HD.

--
mph

Tim Rowe

unread,
Apr 2, 2009, 6:01:48 AM4/2/09
to pytho...@python.org
2009/4/1 一首诗 <newp...@gmail.com>:

> Hi all,
>
> I am a programmer who works with some different kinds of programming
> languages, like python, C++(in COM), action script, C#, etc.
>
> Today, I realized that, what ever language I use, I always meet a same
> problem and I think I never solve it very well.
>
> The problem is : how to break my app into functional pieces?

One approach is to go through the specification of the program,
underline all of the significant nouns and try to implement each of
the nouns as a class. That won't take you all the way to a good design
-- some of the resulting classes will be too trivial, and it won't
give you the derived classes you need, but it's a good first step to
breaking a problem down, and might help break your one big class
habit.

--
Tim Rowe

andrew cooke

unread,
Apr 2, 2009, 7:45:46 AM4/2/09
to pytho...@python.org
Lawrence D'Oliveiro wrote:
>> What are the average size of source files in your project? If it's
>> far lower than 15,000, don't feel it's a little unbalance?
>
> Why?

one reason is that it becomes inefficient to find code. if you structure
code as a set of nested packages, then a module, and finally classes and
methods, then you have a tree structure. and if you divide the structure
along semantic lines then you can efficiently descend the tree to find
what you want. if you choose the division carefully you can get a
balanced tree, giving O(log(n)) access time. in contrast a single file
means a linear scan, O(n).

(i am talking about human use here - people reading and trying to
understand code, perhaps during debugging or code review or whatever).

andrew

(you could argue that the file contents can be sorted in some way - you
could even map from the tree to the file a traversal - but in practice
humans seem to be a lot better at making a series of decisions descending
a tree than holding the entire structure in their head as a sort order)

Steven D'Aprano

unread,
Apr 2, 2009, 10:30:55 AM4/2/09
to
On Thu, 02 Apr 2009 07:45:46 -0400, andrew cooke wrote:

> Lawrence D'Oliveiro wrote:
>>> What are the average size of source files in your project? If it's
>>> far lower than 15,000, don't feel it's a little unbalance?
>>
>> Why?
>
> one reason is that it becomes inefficient to find code. if you
> structure code as a set of nested packages, then a module, and finally
> classes and methods, then you have a tree structure. and if you divide
> the structure along semantic lines then you can efficiently descend the
> tree to find what you want. if you choose the division carefully you
> can get a balanced tree, giving O(log(n)) access time. in contrast a
> single file means a linear scan, O(n).

What's n supposed to be? The number of lines in a file? No, I don't think
so -- you said it yourself: "if you divide the structure along semantic

lines then you can efficiently descend the tree to find what you want".

Not "arbitrarily divide the files after n lines". If one semantic
division requires 15,000 lines, and another semantic division requires 15
lines, then the most efficient way to divide the code base is 15,000
lines in one module and 15 lines in another.

Admittedly, I'd expect that any python module with 15,000 lines
(approximately 900KB in size) could do with some serious refactoring into
modules and packages, but hypothetically it could genuinely make up a
single logical, semantic whole. That's "only" four and a half times
larger than decimal.py.

I can't imagine what sort of code would need to be that large without
being divided into modules, but it could be possible.

--
Steven

一首诗

unread,
Apr 2, 2009, 11:02:23 AM4/2/09
to
You get it. Sometimes I feel that my head is trained to work in a
procedural way. I use a big class just as a container of functions.

About the "data-based" approach, what if these functions all shares a
little data, e.g. a socket, but nothing else?

Jorgen Grahn

unread,
Apr 2, 2009, 12:11:56 PM4/2/09
to
[top-posting fixed]

On Thu, 2 Apr 2009 08:02:23 -0700 (PDT), =?GB2312?B?0rvK18qr?= <newp...@gmail.com> wrote:
> On Apr 2, 5:58 am, Carl Banks <pavlovevide...@gmail.com> wrote:

>> On Apr 1, 12:44 am, ?????? <newpt...@gmail.com> wrote:
>>
>> > I got the same problem when writing C#/C++ when I have to provide a
>> > lot of method to my code's user. So I create a big class as the entry
>> > point of my code. Although these big classes doesn't contains much
>> > logic, they do grow bigger and bigger.
>>
>> This seems to be a classic result of "code-based organization", that
>> is, you are organizing your code according to how your functions are
>> used. That's appropriate sometimes. Procedural libraries are often
>> organized by grouping functions according to use. The os module is a
>> good example.
>>
>> However, it's usually much better to organize code according to what
>> data it acts upon: "data-based organization". In other words, go
>> though your big class and figure out what data belongs together
>> conceptually, make a class for each conceptual set of data, then
>> assign methods to classes based on what data the methods act upon.
>>
>> Consider the os module again. It's a big collection of functions, but
>> there are a group of functions is os that all act on a particular
>> piece of data, namely a file descriptor. This suggests tha all the
>> functions that act upon file descriptors (os.open, os.close, os.seek,
>> etc.) could instead be methods of a single class, with the file
>> descriptor as a class member.

...

> You get it. Sometimes I feel that my head is trained to work in a
> procedural way. I use a big class just as a container of functions.

If that's true, then your problems are not surprising.
A real class normally doesn't get that big.

> About the "data-based" approach, what if these functions all shares a
> little data, e.g. a socket, but nothing else?

If that is true, then those functions *are* the Python socket class
and everything has already been done for you.

Turn your question around and it makes more sense (to me, at least).
You don't primarily work with functions: you work with data, a.k.a.
state, a.k.a. objects. The functions follow from the data.

To me, if I can find something with a certain lifetime, a certain set
of invariants, and a suitable name and catchphrase describing it, then
that's probably a class. Then I keep my fingers crossed and hope it
works out reasonably well. If it doesn't, I try another approach.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!

Carl Banks

unread,
Apr 2, 2009, 1:51:30 PM4/2/09
to
On Apr 2, 8:02 am, 一首诗 <newpt...@gmail.com> wrote:
> You get it. Sometimes I feel that my head is trained to work in a
> procedural way. I use a big class just as a container of functions.
>
> About the "data-based" approach, what if these functions all shares a
> little data, e.g. a socket, but nothing else?

Then perhaps your problem is that you are too loose with the
interface. Do you write new functions that are very similar to
existing functions all the time? Perhaps you should consolidate, or
think about how existing functions could do the job.

Or perhaps you don't have a problem. There's nothing wrong with large
classes per se, it's just a red flag. If you have all these functions
that really all operate on only one piece of data, and really all do
different things, then a large class is fine.


Carl Banks

Emile van Sebille

unread,
Apr 2, 2009, 7:51:24 PM4/2/09
to pytho...@python.org
一首诗 wrote:
> Hi all,
>
> I am a programmer who works with some different kinds of programming
> languages, like python, C++(in COM), action script, C#, etc.
>
> Today, I realized that, what ever language I use, I always meet a same
> problem and I think I never solve it very well.
>
> The problem is : how to break my app into functional pieces?

My question would be why? Refactoring adds nothing to a functioning app
but clarity and maintainability -- both admirable qualities, granted,
and both unnecessary until needed. When I need to update an app is when
I start refactoring, and then just those areas that need it. Certainly
I refactor constantly during development to avoid code reuse through
cut-n-paste, but once I've got it going, whether it's 1000 or 6000
lines, it doesn't matter as long as it works. I'll tease it out when
the upgrades are needed, new applications can reuse pieces, or sooner if
business refactoring requires it.

Emile, writing in the role of sole developer and maintainer of 500k
lines of code dating back 35 years...

Steven D'Aprano

unread,
Apr 3, 2009, 12:23:42 AM4/3/09
to
On Thu, 02 Apr 2009 16:51:24 -0700, Emile van Sebille wrote:

> 一首诗 wrote:
>> Hi all,
>>
>> I am a programmer who works with some different kinds of programming
>> languages, like python, C++(in COM), action script, C#, etc.
>>
>> Today, I realized that, what ever language I use, I always meet a same
>> problem and I think I never solve it very well.
>>
>> The problem is : how to break my app into functional pieces?
>
> My question would be why? Refactoring adds nothing to a functioning app
> but clarity and maintainability -- both admirable qualities, granted,
> and both unnecessary until needed.

But they're always needed, except possibly for use-once throw-away
scripts.


> When I need to update an app is when
> I start refactoring, and then just those areas that need it. Certainly
> I refactor constantly during development

Well, that pretty much disproves your assertion that refactoring is only
needed when updating an application.


> to avoid code reuse through
> cut-n-paste, but once I've got it going, whether it's 1000 or 6000
> lines, it doesn't matter as long as it works.

If you've been refactoring during development, and gotten to the point
where it is working, clear and maintainable, then there's very little
refactoring left to do. I don't think anyone suggests that you refactor
code that doesn't need refactoring. Once it is already split into
functional pieces, there's no need to continue breaking it up further.

--
Steven

Emile van Sebille

unread,
Apr 3, 2009, 1:18:02 AM4/3/09
to pytho...@python.org
Steven D'Aprano wrote:
> On Thu, 02 Apr 2009 16:51:24 -0700, Emile van Sebille wrote:
<snip>
>> I refactor constantly during development to avoid code reuse through

>> cut-n-paste, but once I've got it going, whether it's 1000 or 6000
>> lines, it doesn't matter as long as it works.
>
> If you've been refactoring during development, and gotten to the point
> where it is working,

yes, but

> clear and maintainable,

not necessarily

> then there's very little refactoring left to do.

Again, not necessarily. I often find it easier to refactor old code
when I'm maintaining it to better understand how to best implement the
change I'm incorporating at the moment. The refactoring certainly may
have been done when the code was originally written, but at that time
refactoring would have only served to pretty it up as it already worked.

> I don't think anyone suggests that you refactor
> code that doesn't need refactoring.

That's exactly what I read the OP as wanting to do. That's why I was
asking why. So, I think the question becomes, when does code need
refactoring?

Emile

Michele Simionato

unread,
Apr 3, 2009, 1:55:14 AM4/3/09
to
On Apr 3, 7:18 am, Emile van Sebille <em...@fenx.com> wrote:
>  So, I think the question becomes, when does code need
> refactoring?

I would say that 99.9% of the times a single class with 15,000
lines of code is a signal that something is wrong,
and refactoring is needed.

M. Simionato

一首诗

unread,
Apr 3, 2009, 2:25:42 AM4/3/09
to
Consolidate existing functions?

I've thought about it.

For example, I have two functions:

#=========================

def startXXX(id):
pass

def startYYY(id):
pass
#=========================

I could turn it into one:

#=========================
def start(type, id):
if(type == "XXX"):
pass
else if(type == "YYY"):
pass
#=========================

But isn't the first style more clear for my code's user?

That's one reason why my interfaces grow fast.

Steven D'Aprano

unread,
Apr 3, 2009, 6:55:06 AM4/3/09
to
On Thu, 02 Apr 2009 22:18:02 -0700, Emile van Sebille wrote:

> Steven D'Aprano wrote:
>> On Thu, 02 Apr 2009 16:51:24 -0700, Emile van Sebille wrote:
> <snip>
>>> I refactor constantly during development to avoid code reuse through
>>> cut-n-paste, but once I've got it going, whether it's 1000 or 6000
>>> lines, it doesn't matter as long as it works.
>>
>> If you've been refactoring during development, and gotten to the point
>> where it is working,
>
> yes, but
>
>> clear and maintainable,
>
> not necessarily

If it's not clear and maintainable, then there *is* refactoring left to
do. Whether you (generic you) choose to do so or not is a separate issue.

>> then there's very little refactoring left to do.
>
> Again, not necessarily. I often find it easier to refactor old code
> when I'm maintaining it to better understand how to best implement the
> change I'm incorporating at the moment. The refactoring certainly may
> have been done when the code was originally written, but at that time
> refactoring would have only served to pretty it up as it already worked.
>
>> I don't think anyone suggests that you refactor code that doesn't need
>> refactoring.
>
> That's exactly what I read the OP as wanting to do. That's why I was
> asking why. So, I think the question becomes, when does code need
> refactoring?

(1) When the code isn't clear and maintainable.

(2) When you need to add or subtract functionality which would leave the
code unclear or unmaintainable.

(3) When refactoring would make the code faster, more efficient, or
otherwise better in some way.

(4) When you're changing the API.

--
Steven

Emile van Sebille

unread,
Apr 3, 2009, 9:36:33 AM4/3/09
to pytho...@python.org
Steven D'Aprano wrote:
> On Thu, 02 Apr 2009 22:18:02 -0700, Emile van Sebille wrote:
>
>> Steven D'Aprano wrote:
>>> On Thu, 02 Apr 2009 16:51:24 -0700, Emile van Sebille wrote:
>> <snip>
>>>> I refactor constantly during development to avoid code reuse through
>>>> cut-n-paste, but once I've got it going, whether it's 1000 or 6000
>>>> lines, it doesn't matter as long as it works.
>>> If you've been refactoring during development, and gotten to the point
>>> where it is working,
>> yes, but
>>
>>> clear and maintainable,
>> not necessarily
>
> If it's not clear and maintainable, then there *is* refactoring left to
> do.

Agreed.

> Whether you (generic you) choose to do so or not is a separate issue.

Also agreed - and that is really my point. Doing so feels to me like
continuing to look for a lost object once you've found it.

<snip>

>> So, I think the question becomes, when does code need refactoring?
> (1) When the code isn't clear and maintainable.
>
> (2) When you need to add or subtract functionality which would leave the
> code unclear or unmaintainable.
>
> (3) When refactoring would make the code faster, more efficient, or
> otherwise better in some way.
>
> (4) When you're changing the API.

Certainly agreed on (2) and (4). (1) follows directly from (3). And (3)
only after an issue has been observed.

Emile

andrew cooke

unread,
Apr 3, 2009, 9:58:55 AM4/3/09
to Emile van Sebille, pytho...@python.org
Emile van Sebille wrote:
>>> Whether you (generic you) choose to do so or not is a separate issue.
>
> Also agreed - and that is really my point. Doing so feels to me like
> continuing to look for a lost object once you've found it.

i can see your point here, but there's two things more to consider:

1 - if you do need to refactor it later, because there is a bug say, it
will be harder to do so because you will have forgotten much about the
code. so if it is likely that you will need to refactor in the future, it
may pay to do some of that work now.

2 - if someone else needs to work with the code then the worse state it is
in - even if it works - the harder time they will have understanding it.
which could lead to them using or extending it incorrectly, for example.

both of the above fall under the idea that code isn't just a machine that
produces a result, but also serves as documentation. and working code
isn't necessarily good documentation.

i don't think there's a clear, fixed answer to this (i don't think "stop
refactoring as soon as all tests work" can be a reliable general rule any
more than "refactor until it is the most beautiful code in the world" can
be). you need to use your judgement on a case-by-case basis.

in fact, the thing i am most sure of in this thread is that 15000 lines of
code in one module is a disaster. the likelihood of that being ok seems
so small, compared to all the other uncertainties in software development,
that i cannot see why people are even discussing it (well, i can
understand, because human nature is what it is, and software development
seems to attract a certain kind of pedantic, rigid mind, but even so...)

andrew


Emile van Sebille

unread,
Apr 3, 2009, 12:21:41 PM4/3/09
to pytho...@python.org
andrew cooke wrote:

> Emile van Sebille wrote:
>>>> Whether you (generic you) choose to do so or not is a separate issue.
>> Also agreed - and that is really my point. Doing so feels to me like
>> continuing to look for a lost object once you've found it.
>
> i can see your point here, but there's two things more to consider:
>
> 1 - if you do need to refactor it later, because there is a bug say, it
> will be harder to do so because you will have forgotten much about the
> code.

Yes, I generally count on it. Refactoring at that time is precisely
when you get the most benefit, as it will concisely focus your
attentions on the sections of code that need to be clearer to support
the debugging changes. Face it, you'll have to get your head around the
code anyway, be it 1, 5, or 10k lines and all beautifully structured or
not. Remember, proper refactoring by definition does not change
functionality -- so that bug in the code will be there regardless.

> so if it is likely that you will need to refactor in the future, it
> may pay to do some of that work now.

Certainly -- and I envy those who know which sections to apply their
attentions to and when to stop. Personally, I stop when it works and
wait for feedback.

> 2 - if someone else needs to work with the code then the worse state it is
> in - even if it works - the harder time they will have understanding it.
> which could lead to them using or extending it incorrectly, for example.

Assuming you're talking about non-refactored code when you say worse,
consider Zope vs Django. I have no doubt that both meet an acceptable
level of organization and structure intended in part to facilitate
maintenance. I've got multiple deployed projects of each. But I'll hack
on Django if it doesn't do what I want and I find that easy, while
hacking on Zope ranks somewhere behind having my mother-in-law come for
a three-week stay on my favorite-things-to-do list. Refactored code
doesn't necessarily relate to easier understanding.

> both of the above fall under the idea that code isn't just a machine that
> produces a result, but also serves as documentation. and working code
> isn't necessarily good documentation.

Here I agree. Once I've got it working and I have the time I will add
minor clean up and some notes to help me the next time I'm in there.
Clean up typically consists of dumping unused cruft, relocating imports
to the top, and adding a couple lines of overview comments. On the
other hand, I do agree with Aahz's sometimes tag line quote accepting
all comments in code as lies. It's akin to believing a user -- do so
only at your own peril. They're really bad witnesses.

> i don't think there's a clear, fixed answer to this (i don't think "stop
> refactoring as soon as all tests work" can be a reliable general rule any
> more than "refactor until it is the most beautiful code in the world" can
> be). you need to use your judgement on a case-by-case basis.

Well said.

> in fact, the thing i am most sure of in this thread is that 15000 lines of
> code in one module is a disaster.

Agreed. I took a quick scan and the largest modules I'm working with
look to be closer to 1500 lines. Except tiddlywiki of course, which
comes in at 9425 lines in the current download before adding anything to
it. I bet I'd prefer even hacking that to zope though.

One programmer's disaster is another programmer's refactoring dream :)

Emile


paul

unread,
Apr 3, 2009, 2:02:39 PM4/3/09
to pytho...@python.org
一首诗 schrieb:

> Consolidate existing functions?
>
> I've thought about it.
>
> For example, I have two functions:
>
> #=========================
>
> def startXXX(id):
> pass
>
> def startYYY(id):
> pass
> #=========================
>
> I could turn it into one:
>
> #=========================
> def start(type, id):
> if(type == "XXX"):
> pass
> else if(type == "YYY"):
> pass
> #=========================
>
> But isn't the first style more clear for my code's user?
Depends ;)

There are more ways to structure code than using classes. To avoid the
if-elif-elif-elif-else problem you could start using a dispatch table
which maps types to functions (fex using a dict)

start_methods = {
'type1': startXX,
'type2': startYY,
}

def start(type, id):
func = start_methods.get(type, None)
if func:
func(id)
else:
raise ...

Or maybe look at trac's (http://trac.edgewall.com) use of Components and
Interfaces. Very lightweight and modular. You can start reading here:
http://trac.edgewall.org/browser/trunk/trac/core.py

cheers
Paul

>
> That's one reason why my interfaces grow fast.
>
> On Apr 3, 1:51 am, Carl Banks <pavlovevide...@gmail.com> wrote:
>> On Apr 2, 8:02 am, 一首诗 <newpt...@gmail.com> wrote:
>>
>>> You get it. Sometimes I feel that my head is trained to work in a
>>> procedural way. I use a big class just as a container of functions.
>>> About the "data-based" approach, what if these functions all shares a
>>> little data, e.g. a socket, but nothing else?
>> Then perhaps your problem is that you are too loose with the
>> interface. Do you write new functions that are very similar to
>> existing functions all the time? Perhaps you should consolidate, or
>> think about how existing functions could do the job.
>>
>> Or perhaps you don't have a problem. There's nothing wrong with large
>> classes per se, it's just a red flag. If you have all these functions
>> that really all operate on only one piece of data, and really all do
>> different things, then a large class is fine.
>>
>> Carl Banks
>

> --
> http://mail.python.org/mailman/listinfo/python-list

Carl Banks

unread,
Apr 4, 2009, 12:10:30 AM4/4/09
to
On Apr 2, 11:25 pm, 一首诗 <newpt...@gmail.com> wrote:
> Consolidate existing functions?
>
> I've thought about it.
>
> For example, I have two functions:
>
> #=========================
>
> def startXXX(id):
> pass
>
> def startYYY(id):
> pass
> #=========================
>
> I could turn it into one:
>
> #=========================
> def start(type, id):
> if(type == "XXX"):
> pass
> else if(type == "YYY"):
> pass
> #=========================
>
> But isn't the first style more clear for my code's user?

Not necessarily, especially if the user wants to dynamically choose
which start*** function to call.

I have one more suggestion. Consider whether there are groups of
methods that are used together but aren't used with other groups of
functions. For instance, maybe there is a group of methods that can
only be called after a call to startXXX. If that's the case, you
might want to separate those groups into different classes. The
branched-off class would then act as a sort of session handler.

A piece of user code that looked like this (where sc is an instance of
your enormous class):

sc.startX()
sc.send_data_via_X()
sc.receive_data_via_X()
sc.stopX()

might look like this after you factor it out:

session = sc.startX() # creates and returns a new XSession object
session.send_data() # these are methods of the XSession
session.receive_data()
session.stop()


Any methods that are callable any time, you can retain in the big
class, or put in a base class of all the sessions.


Carl Banks

Michele Simionato

unread,
Apr 4, 2009, 1:21:07 AM4/4/09
to
On Apr 4, 6:10 am, Carl Banks <pavlovevide...@gmail.com> wrote:
> A piece of user code that looked like this (where sc is an instance of
> your enormous class):
>
> sc.startX()
> sc.send_data_via_X()
> sc.receive_data_via_X()
> sc.stopX()
>
> might look like this after you factor it out:
>
> session = sc.startX()  # creates and returns a new XSession object
> session.send_data()    # these are methods of the XSession
> session.receive_data()
> session.stop()
>
> Any methods that are callable any time, you can retain in the big
> class, or put in a base class of all the sessions.

That's good advice. A typical refactoring technique when
working with blob classes is to extract groups of methods
with commmon functionality, put them in a helper
class, make a helper object and pass it to the
original blob. In other words, tp split the blob
object as a composition of small logically independent
objects. BTW, is there anybody in this lists that
can suggest good books about refactoring in Python?
There are plenty of books about refactoring for Java
and C++ but on top of my mind I cannot think of a Python
book right now.

一首诗

unread,
Apr 4, 2009, 7:54:48 AM4/4/09
to
That's clever. I never thought of that. Not only something concrete,
like people, could be class, but a procedure, like a Session, could
also be a Class.

Thanks for you all who replied. I learned a lot from this thread and
I even made some notes of all your advices because I think I might
review them many times in my future work.

andrew cooke

unread,
Apr 4, 2009, 8:29:40 AM4/4/09
to Ò»Ê×Ê«, pytho...@python.org
Note sure who wrote:
>> > Consolidate existing functions?
>>
>> > I've thought about it.
>>
>> > For example, I have two functions:
>>
>> > #=========================
>>
>> > def startXXX(id):
>> > pass
>>
>> > def startYYY(id):
>> > pass
>> > #=========================
>>
>> > I could turn it into one:
>>
>> > #=========================
>> > def start(type, id):
>> > if(type == "XXX"):
>> > pass
>> > else if(type == "YYY"):
>> > pass
>> > #=========================

in general, if you are testing type that is an indication you should
refactor to use a base class with subclassing.

for example:

def my_print(x):
print 'i have a ',
if type(x) is Foo:
print 'furry foo'
elif type(x) is Bar:
print 'bubbly Bar;
else:
print 'strange beast'

might be refactored as

class MyPrintable(object):
def my_print(self):
print 'i have a ', self.describe()

class Foo(MyPrintable):
def describe(self):
return 'furry Foo'

etc etc.

it's not always possible, but any type() or isinstance() in an OO program
is a big red flag that the design is wrong.

andrew

andrew cooke

unread,
Apr 4, 2009, 8:32:00 AM4/4/09
to Ò»Ê×Ê«, pytho...@python.org
andrew cooke wrote:
[...]

>>> > #=========================
>>> > def start(type, id):
>>> > if(type == "XXX"):
>>> > pass
>>> > else if(type == "YYY"):
>>> > pass
>>> > #=========================

i just realised i am assuming type is a type of an object, but you might
be using it to mean something else here, in which case case my advice
might be irrelevant (but not necessarily - the value of type might still
depend on an object type in some way, and often does in oo programming).

andrew

Message has been deleted
0 new messages