Opinions on mostly superfluous refactoring

Erik Bray

unread,

May 30, 2016, 12:22:53 PM5/30/16

to sage-...@googlegroups.com

Hi all,

I recently needed to dive into the sage_setup.autogen.interpreters
module in order to make some small changes. The file is over 4000
lines long, which is a bit on the long side for your typical Python
file, though not egregious by any means. That said, when trying to
understand some relatively complicated code I find it helpful to break
up into smaller bite-sized logical chunks that are easy to get around
in an editor and reason about. When and how to do this can of course
be highly subjective.

In the case of autogen.interpreters, in the process of understanding
it, it was my immediate instinct, perhaps a bit impulsive, to start
breaking it into multiple files anyways, and about half an hour later
I've done so with success.

I think it would be a good change to feed back into sage, but it's
also a bit frivolous since there are no other substantive changes. I
think it makes the code easier to understand. But of course the main
downside to this kind of refactoring is that it makes the history
harder to follow--not impossible--just harder.

How does this community feel about this sort of refactoring? On the
outset it could be seen as frivolous, but in the long term it can be
for the best, especially as development continues and some of the
resulting modules grow larger on their own.

Thanks,
Erik

Volker Braun

unread,

May 30, 2016, 12:29:41 PM5/30/16

to sage-devel

Imho many sage modules have grown too long, including but not limited to some 20kloc monsters. +1 to splitting things up!

Erik Bray

unread,

May 30, 2016, 12:35:53 PM5/30/16

to sage-...@googlegroups.com

On Mon, May 30, 2016 at 6:29 PM, Volker Braun <vbrau...@gmail.com> wrote:
> Imho many sage modules have grown too long, including but not limited to
> some 20kloc monsters. +1 to splitting things up!

Yikes! I'm not sure I've even run into any of those yet. 20kloc is
definitely worth splitting up. ~4k is a lot more arguable I think,
but I personally don't like files to get much longer than 1kloc
depending of course on how practical it is to further subdivide, which
it isn't always.

sage_setup.autogen.interpreters had the advantage of already being
pretty well structured, and lots of small classes with few
interdependencies that were very logically subdivided.

> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.

William Stein

unread,

May 30, 2016, 12:43:21 PM5/30/16

to sage-devel

On Mon, May 30, 2016 at 9:22 AM, Erik Bray <erik....@gmail.com> wrote:
> Hi all,
>
> I recently needed to dive into the sage_setup.autogen.interpreters
> module in order to make some small changes.

Quick link:
https://github.com/sagemath/sage/blob/master/src/sage_setup/autogen/interpreters.py

It's the fast_callable stuff.

Were you diving in to make (sage expression)(numpy array) work? If so, awesome!

> The file is over 4000
> lines long, which is a bit on the long side for your typical Python
> file, though not egregious by any means. That said, when trying to
> understand some relatively complicated code I find it helpful to break
> up into smaller bite-sized logical chunks that are easy to get around
> in an editor and reason about. When and how to do this can of course
> be highly subjective.
>
> In the case of autogen.interpreters, in the process of understanding
> it, it was my immediate instinct, perhaps a bit impulsive, to start
> breaking it into multiple files anyways, and about half an hour later
> I've done so with success.
>
> I think it would be a good change to feed back into sage, but it's
> also a bit frivolous since there are no other substantive changes. I
> think it makes the code easier to understand. But of course the main
> downside to this kind of refactoring is that it makes the history
> harder to follow--not impossible--just harder.
>
> How does this community feel about this sort of refactoring? On the
> outset it could be seen as frivolous, but in the long term it can be
> for the best, especially as development continues and some of the
> resulting modules grow larger on their own.

+1 -- my only concern is for code that does something like:

"from sage_setup.autogen.interpreters import blah"

Maybe this isn't an issue in this case, but in general it could be.

Sage devs are very good at git, so dealing with the history issue
isn't at all a show stopper.

Separating code into many smaller modules really does provide a lot of
genuine value, in that it makes scope and dependencies much, much
clearer, which makes the code potentially way easier to understand.

-- William

--
William (http://wstein.org)

Erik Bray

unread,

May 30, 2016, 12:47:09 PM5/30/16

to sage-...@googlegroups.com

On Mon, May 30, 2016 at 6:42 PM, William Stein <wst...@gmail.com> wrote:
> On Mon, May 30, 2016 at 9:22 AM, Erik Bray <erik....@gmail.com> wrote:
>> Hi all,
>>
>> I recently needed to dive into the sage_setup.autogen.interpreters
>> module in order to make some small changes.
>
> Quick link:
> https://github.com/sagemath/sage/blob/master/src/sage_setup/autogen/interpreters.py
>
> It's the fast_callable stuff.
>
> Were you diving in to make (sage expression)(numpy array) work? If so, awesome!

Sadly, not this time. It's to make the generated code build on
Windows. However, I also want to work on the issue you mentioned, and
cleaning up this code a bit will likely help.

>> The file is over 4000
>> lines long, which is a bit on the long side for your typical Python
>> file, though not egregious by any means. That said, when trying to
>> understand some relatively complicated code I find it helpful to break
>> up into smaller bite-sized logical chunks that are easy to get around
>> in an editor and reason about. When and how to do this can of course
>> be highly subjective.
>>
>> In the case of autogen.interpreters, in the process of understanding
>> it, it was my immediate instinct, perhaps a bit impulsive, to start
>> breaking it into multiple files anyways, and about half an hour later
>> I've done so with success.
>>
>> I think it would be a good change to feed back into sage, but it's
>> also a bit frivolous since there are no other substantive changes. I
>> think it makes the code easier to understand. But of course the main
>> downside to this kind of refactoring is that it makes the history
>> harder to follow--not impossible--just harder.
>>
>> How does this community feel about this sort of refactoring? On the
>> outset it could be seen as frivolous, but in the long term it can be
>> for the best, especially as development continues and some of the
>> resulting modules grow larger on their own.
>
> +1 -- my only concern is for code that does something like:
>
> "from sage_setup.autogen.interpreters import blah"
>
> Maybe this isn't an issue in this case, but in general it could be.

Yep. I made sure that it still fills out the same namespace more or
less (as far as anything seems to care about).

> Sage devs are very good at git, so dealing with the history issue
> isn't at all a show stopper.

I've seen people complain about this before which is partly why I ask.
I think it's a valid complaint too, but definitely not impossible to
overcome.

> Separating code into many smaller modules really does provide a lot of
> genuine value, in that it makes scope and dependencies much, much
> clearer, which makes the code potentially way easier to understand.

Yep, I agree completely.

Thanks,
Erik

Jori Mäntysalo

unread,

May 30, 2016, 12:48:03 PM5/30/16

to sage-...@googlegroups.com

On Mon, 30 May 2016, Erik Bray wrote:

>> Imho many sage modules have grown too long, including but not limited to
>> some 20kloc monsters. +1 to splitting things up!
>
> Yikes! I'm not sure I've even run into any of those yet. 20kloc is
> definitely worth splitting up. ~4k is a lot more arguable I think,
> but I personally don't like files to get much longer than 1kloc
> depending of course on how practical it is to further subdivide, which
> it isn't always.

What you mean by kloc? Lines of *code*, or lines of whole file?

There is 21312 lines at src/sage/graphs/generic_graph.py. But there just
is so much functions defined for (di)graphs, with different options and
algorithms and so on.

--
Jori Mäntysalo

William Stein

unread,

May 30, 2016, 12:51:11 PM5/30/16

to sage-devel

Definitely the intention was just file length. For example,

https://github.com/sagemath/sage/blob/master/src/sage_setup/autogen/interpreters.py

is *mostl* (maybe 80%) comments/docstrings! It's some of the
best-written code I've ever seen.

--
William (http://wstein.org)

Erik Bray

unread,

May 30, 2016, 12:55:12 PM5/30/16

to sage-...@googlegroups.com

On Mon, May 30, 2016 at 6:50 PM, William Stein <wst...@gmail.com> wrote:
> On Mon, May 30, 2016 at 9:47 AM, Jori Mäntysalo <Jori.Ma...@uta.fi> wrote:
>> On Mon, 30 May 2016, Erik Bray wrote:
>>
>>>> Imho many sage modules have grown too long, including but not limited to
>>>> some 20kloc monsters. +1 to splitting things up!
>>>
>>>
>>> Yikes! I'm not sure I've even run into any of those yet. 20kloc is
>>> definitely worth splitting up. ~4k is a lot more arguable I think,
>>> but I personally don't like files to get much longer than 1kloc
>>> depending of course on how practical it is to further subdivide, which
>>> it isn't always.
>>
>>
>> What you mean by kloc? Lines of *code*, or lines of whole file?
>>
>> There is 21312 lines at src/sage/graphs/generic_graph.py. But there just is
>> so much functions defined for (di)graphs, with different options and
>> algorithms and so on.
>
> Definitely the intention was just file length. For example,
>
> https://github.com/sagemath/sage/blob/master/src/sage_setup/autogen/interpreters.py

I definitely tend to consider most reasonably informative comments and
docstrings as part of the "code". Especially in Sage considering the
prevalence of doctests.

> is *mostl* (maybe 80%) comments/docstrings! It's some of the
> best-written code I've ever seen.

Yeah, it's really good! As evidenced by how easy it was to break into
more files.

Reply all

Reply to author

Forward