Documenting Factory Functions (RE trac #13282)

99 views
Skip to first unread message

samgonshaw

unread,
Jul 30, 2012, 12:02:52 PM7/30/12
to sage-...@googlegroups.com
Say I wrote a suite of functions and made them available through a factory class such as:

{{{
class MyFunctions():
    def FunctionA(self,*args,**kwargs):
        ...
    def Functionb(self,*args,**kwargs):
        ...
    def FunctionC(self,*args,**kwargs):
        ...

my_functions=MyFunctions()
}}}

and importing my_functions on opening sage.
When writing documentation for FunctionA,B,C,... for examples and tests should I begin these for each function with a new instance of the class, or should I use the standard instance that the user will access?

i.e.
{{{
sage: inst=MyFunctions()
sage: inst.FunctionA(*args,**kwargs)
}}}

or plain ``sage: my_functions.FunctionA(*args,**kwargs)``

This is for the work on #13282, providing a suite of functions to access the graded ring databases of Fano polytopes.

Thanks,

Samuel Gonshaw

Nils Bruin

unread,
Jul 30, 2012, 1:31:49 PM7/30/12
to sage-devel
On Jul 30, 9:02 am, samgonshaw <samgons...@gmail.com> wrote:
> Say I wrote a suite of functions and made them available through a factory
> class such as:
>
> {{{
> class MyFunctions():
>     def FunctionA(self,*args,**kwargs):
>         ...
>     def Functionb(self,*args,**kwargs):
>         ...
>     def FunctionC(self,*args,**kwargs):
>         ...
>
> my_functions=MyFunctions()
>
> }}}

In python there is the concept of a "module" as well as a class.
Unless the functions above make nontrivial use of the "self" parameter
(store information on the instance etc.), your functions should
probably just be functions in a module, not methods on a class that
only gets instantiated once and whose only function is to serve as a
container for your functions. Modules can do that much better.

> and importing my_functions on opening sage.
> When writing documentation for FunctionA,B,C,... for examples and tests
> should I begin these for each function with a new instance of the class, or
> should I use the standard instance that the user will access?

Write the tests as typical usage. That's the behaviour you mainly want
to test. Include non-typical use as edge cases if you think that will
be useful in catching regressions.

Again, if there is a "standard instance that the user will access",
you have a strong indication that your functions should be living in a
module, not as methods of a class.

Rob Beezer

unread,
Jul 30, 2012, 3:06:51 PM7/30/12
to sage-...@googlegroups.com
On Monday, July 30, 2012 10:31:49 AM UTC-7, Nils Bruin wrote:
In python there is the concept of a "module" as well as a class.
Modules can do that much better.

The pattern Samuel suggests shows up several places in Sage, such as for examples of graphs and posets.  I'm doing something similar for groups at

http://trac.sagemath.org/sage_trac/ticket/13115

which I should probably rethink and retool slightly in light of Nils' comments (thanks, Nils).

Then the thing to do would be to write

my_functions = <full/relative name of the module with the functions>

A small amount of experimenting suggests this approach is very compatible with tab-completion, which is one of the main features of building such a collection.  However, tab-completion exposes some irrelevant module-level cruft, like anything that gets imported at the top of the module (eg ZZ, say).  So the module would need to be fairly "clean" I'd think, with *just* the desired functions, though a module-level docstring is a nice thing to employ.

Where should the statement defining  my_functions  go?  Someplace at a level above the module itself?  In the  all.py  at that level?

Rob
 

Nils Bruin

unread,
Jul 30, 2012, 4:31:53 PM7/30/12
to sage-devel
On Jul 30, 12:06 pm, Rob Beezer <goo...@beezer.cotse.net> wrote:
> Then the thing to do would be to write
>
> my_functions = <full/relative name of the module with the functions>

No, I would think that the module should appear exactly in the spot
where you want the names imported. "import my_functions" would be the
appropriate thing to do, unless there is some architectural reason why
the module cannot occur in the location that would induce the
appropriate namespace name.

> A small amount of experimenting suggests this approach is very compatible
> with tab-completion, which is one of the main features of building such a
> collection.  However, tab-completion exposes some irrelevant module-level
> cruft,

If you want a clean tab completion, simply write the module that
implements the thing as

my_function_module_implementation.py

and then write another module

my_function_module_interface.py containing the one line

from my_function_module_implementation import function1, function2,
function3

then my_function_module_interface provides a namespace free from
cruft, at the expense of having to repeat yourself. Note that tab
completion on instances and classes also exposes attributes that have
to do with implementation rather than interface.

> Where should the statement defining  my_functions  go?  Someplace at a
> level above the module itself?  In the  all.py  at that level?

Ideally, nowhere! The relevant "import" already defines them and if
you can ensure that the natural name of the module is the desired one,
no further action is required.

While I agree that tab completion is very useful as a low-cost
documentation tool, I find it a little worrisome if our design
decisions come to be dominated by having tab completion generate
appropriate indices. In lots of cases it will go hand in hand with
writing clean code to begin with (messy namespaces are often a sign of
messy code), but if it forces us to go through strange contortions
(like a separate index namespace module as above) perhaps we have to
rethink our use of tab completion.

Mike Hansen

unread,
Jul 30, 2012, 4:37:20 PM7/30/12
to sage-...@googlegroups.com
On Mon, Jul 30, 2012 at 10:31 AM, Nils Bruin <nbr...@sfu.ca> wrote:
> While I agree that tab completion is very useful as a low-cost
> documentation tool, I find it a little worrisome if our design
> decisions come to be dominated by having tab completion generate
> appropriate indices. In lots of cases it will go hand in hand with
> writing clean code to begin with (messy namespaces are often a sign of
> messy code), but if it forces us to go through strange contortions
> (like a separate index namespace module as above) perhaps we have to
> rethink our use of tab completion.

The new version of IPython has an option to have the tab completion
respect the __all__ variable in the module. This would be one way to
deal with this. See https://github.com/ipython/ipython/pull/1529

--Mike

Rob Beezer

unread,
Jul 30, 2012, 5:07:59 PM7/30/12
to sage-...@googlegroups.com
On Monday, July 30, 2012 1:31:53 PM UTC-7, Nils Bruin wrote:
While I agree that tab completion is very useful as a low-cost
documentation tool, I find it a little worrisome if our design
decisions come to be dominated by having tab completion generate
appropriate indices. In lots of cases it will go hand in hand with
writing clean code to begin with (messy namespaces are often a sign of
messy code), but if it forces us to go through strange contortions
(like a separate index namespace module as above) perhaps we have to
rethink our use of tab completion.

Thanks, Nils, for the explication - that is very helpful.  And thanks, Mike, for the IPython pointer.

I agree that we shouldn't be making messes to drive tab-completion.  But as always, we have design decisions from the past, which we might want to change with hindsight.  In this case, most groups are now available via imports into the global namespace.  They are easy to find, if you already know they are there and what their names are.  ;-)  As usual, I'm trying to help the beginner get started quickly and easily with a large collection of examples.

The current approach I've been taking (and perhaps the one originally proposed in this thread) means a lot of (unnecessary) duplication.  And I have written new/additional/consistent/modern documentation that would be better added to improve the original classes.  While perhaps galling, it sounds to me like layering a "separate index namespace module" on top of the current disorganization will be a small price to pay for a big improvement in usability and will be much more maintainable down the road.

Thanks again,
Rob
 

samgonshaw

unread,
Jul 31, 2012, 7:01:24 AM7/31/12
to sage-...@googlegroups.com
Thanks for all the advice and ideas guys!

It maybe though doesn't seem a good idea at this time to embark on having a second indexing module for this when this doesn't seem to have been implemented elsewhere at this time, though to update to this format in the future?

Definitely right on not using a class, doesn't seem right! Another thought on importing the module, if all the imports I make within that module are performed with an `` as`` statement to a private name e.g. ``from sage.rings.all import ZZ as _ZZ`` then the new module can be imported wholesale and the private names are not included in tab-completion, saving clutter.

Andrey Novoseltsev

unread,
Jul 31, 2012, 10:07:40 AM7/31/12
to sage-devel
On Jul 31, 5:01 am, samgonshaw <samgons...@gmail.com> wrote:
> Thanks for all the advice and ideas guys!
>
> It maybe though doesn't seem a good idea at this time to embark on having a
> second indexing module for this when this doesn't seem to have been
> implemented elsewhere at this time, though to update to this format in the
> future?
>

Well, someone has to try and start using it, so why not now?-)

> Definitely right on not using a class, doesn't seem right! Another thought
> on importing the module, if all the imports I make within that module are
> performed with an `` as`` statement to a private name e.g. ``from
> sage.rings.all import ZZ as _ZZ`` then the new module can be imported
> wholesale and the private names are not included in tab-completion, saving
> clutter.

I think that this definitely should NOT be done, as it means that the
whole code in your module will be written using this "custom" names
rather than standard ones throughout Sage. This will make it more
difficult both to read and maintain the code.

I guess you can also have imports inside the functions - if they are
not called too often it should not be a big performance penalty. Of
course, if the same import is needed by many functions or there are a
lot of them this approach is not convenient.

Thank you,
Andrey

samgonshaw

unread,
Jul 31, 2012, 12:49:53 PM7/31/12
to sage-...@googlegroups.com
Well, I've gone for this attempt and have ended up with two files, polytopes_grdb_backend.py that contains all my functions, and polytopes_grdb.py that just imports the functions we want to appear to the user, and then all.py imports polytopes_grdb.

The problem here is that even if I write the documentation in polytopes_grdb_backend as though they were functions in polytopes_grdb  this is fine for in-sage documentation, what about for the reference documentation?

If I tell Sphinx to add polytopes_grdb to the documentation, nothing will appear, as the documentation is in polytopes_grdb_backend; but if I tell it to bring up polytopes_grdb_backend then the functions are shown to the user as living there and not in polytopes_grdb as we want them to believe.

Thoughts?

Sam.

Rob Beezer

unread,
Jul 31, 2012, 1:43:28 PM7/31/12
to sage-...@googlegroups.com
Dear Samuel,

See if the following might work.

1.  Add a module-level docstring in polytopes_grdb.py.  The should appear when one does "polytopes_grdb?" and it should appear in the documentation when included properly.

2.  In the docstring, make an organized "table of contents" for each of the methods you are importing from polytopes_grdb_backend.py with a ReST-formatted link to each (syntax like :func:`~sage.foo.bar`), and explain the purpose of this object/module.

3.  Document the individual functions like normal, but maybe include a   "note ::" saying "You can get  bar()  easily with  polytopes_grdb.bar()"  Include a test in each function that creates the object via the polytope_grdb module.

I have not tried out all of this, but I have used the module level docstring for a table of contents to the object for groups that I was building (but will refactor into a module), so I have some confidence in that part.  Others may have better ideas (or corrections).

One big advantage is you can organize the module docstring however you like with the links to the individual functions.  This is in contrast to the HTML/PDF documentation for  polytopes_grdb_backend.py  which will list the functions in alphabetical order.

Rob

Andrey Novoseltsev

unread,
Jul 31, 2012, 1:46:56 PM7/31/12
to sage-devel
Hmmm, I think we don't really want the users to think anything
particular about the location of functions module-wise. Those who need
it to write new library code should be able to figure things out. For
"regular users" it is important to see how to access the functions. So
if polytopes_grdb_backend is included into the documentation under a
suitable title (which does not need to mention the module name) and
all examples there are done as

sage: polytopes_grdb.stuff()

there should not be any confusion. The current approach with factory
objects is similar: things are documented in a module which is not
called by the same name as the factory.

Andrey

Jason Grout

unread,
Aug 1, 2012, 9:56:02 AM8/1/12
to sage-...@googlegroups.com
On 7/31/12 7:07 AM, Andrey Novoseltsev wrote:
> I guess you can also have imports inside the functions - if they are
> not called too often it should not be a big performance penalty.

IIRC, the imports are cached automatically in Python, so it would be a
performance hit the first time the function was called (if that module
hadn't been imported somewhere already), but after that, it should be
pretty fast.

Thanks,

Jason


Simon King

unread,
Aug 1, 2012, 2:01:51 PM8/1/12
to sage-...@googlegroups.com
Hi Jason,
No, the overhead is considerable:

sage: def test1(s):
....: from sage.all import ZZ
....: return ZZ(s)
....:
sage: def test2(s):
....: return ZZ(s)
....:
sage: %timeit a = test1('123')
625 loops, best of 3: 6.47 µs per loop
sage: %timeit a = test2('123')
625 loops, best of 3: 2.8 µs per loop

So, if you have a "small" function or method that is called very often
then better avoid local imports.

Best regards,
Simon


samgonshaw

unread,
Aug 24, 2012, 10:12:02 AM8/24/12
to sage-...@googlegroups.com
Hey all, posted up a new patch on http://trac.sagemath.org/sage_trac/ticket/13282; would appreciate if someone would give it a quick review. Tom Coates will be maintaining it after I finish so we would really like someone else to have a look!

Thanks
Reply all
Reply to author
Forward
0 new messages