cached_function: a filter to only cache some inputs ?

148 views
Skip to first unread message

Nathann Cohen

unread,
May 14, 2014, 5:52:45 AM5/14/14
to Sage devel
Helloooooooo everybody !!!

If you play with the designs.orthogonal_array(k,n) function, you will see that it returns three kinds of things :

1) A matrix (potentially big)
2) A boolean (telling you whether the matrix CAN be built)
3) An integer (telling you how large the matrix can be)

Now, it would be COOL if this function could cache its boolean/integer answers (for we ask the same questions very often), but caching the matrices is a really (really) bad idea.

The return type is determined by an argument of orthogonal_array, existence=True, and we only want to cache answers to that.

1) Is there a way to obtain this behaviour with @cached_function ?
2) If not, do you think that implementing this is a nice idea ? (I am worried of making the standard use case slower)

THaaaaaaaaaaaanks for your help !

Nathann

Volker Braun

unread,
May 14, 2014, 6:18:09 AM5/14/14
to sage-...@googlegroups.com
Don't overload functions by return type. This is generally a bad idea. If designs.orthogonal_array doesn't return some kind of array then it is poorly named. Use three different functions, e.g. orthogonal_array() / is_constructible() / size_of_array()

This basic design pattern also fixes your caching problem.

Nathann Cohen

unread,
May 14, 2014, 6:32:10 AM5/14/14
to Sage devel
Yooooooooo !


> Don't overload functions by return type. This is generally a bad idea.

Having general ideas is also a bad idea in general.


> If
> designs.orthogonal_array doesn't return some kind of array then it is poorly
> named. Use three different functions, e.g. orthogonal_array() /
> is_constructible() / size_of_array()
>
> This basic design pattern also fixes your caching problem.

Yes, but it duplicates the code horribly. Because orthogonal_array is a long sequence of tests and answers,and you do not want to find this long collection of tests and answers copied and pasted into three different functions, nor to find out that "for some reason" they are not always consistent with each other.

More precisely :


> designs.orthogonal_array doesn't return some kind of array then it is poorly named

No. Because if you want to build orthogonal arrays, you *will* find a function named orthogonal_array. You may not find a function named size_of_maximal_orthogonal_array.

Besides, this is pretty clear :

sage: designs.orthogonal_array(4,3,existence=True)
True

By the way, I love how nobody has a problem with

sage: designs.OrthogonalArray(4,3).exists()
True
sage: designs.OrthogonalArray(4,3).matrix()
BigMatrix

while on the other hand

sage: designs.orthogonal_array(4,3,existence=True)
True
sage: designs.orthogonal_array(4,3)
BigMatrix

is "clearly bad design". Come on guys, we write Python code ! The point is not only to make our code dead slow, it is ALSO to use the advantages of the language. Stop writing Java and C wrapped in Python ! Do you complain when "wc -l" returns an integer and "wc" returns three integers and a filename ? Do you prefer wc.complete_info() and wc.just_the_number_of_lines() ?

Anyway, my question still stands. And if it does not seem "realistic", you can imagine I hid the truth and asked "How can I only cache output when the integer input is small, and not cache it otherwise" ?

... as this is a more "technically acceptable question" :-P

Nathann

Nathann Cohen

unread,
May 14, 2014, 6:38:35 AM5/14/14
to Sage devel
> By the way, I love how nobody has a problem with
>
> sage: designs.OrthogonalArray(4,3).exists()
> True
> sage: designs.OrthogonalArray(4,3).matrix()
> BigMatrix

(Of course designs.OrthogonalArray(4,3) would return a
UniqueRepresentation object worth 10 or 20 bytes, and this stays
cached somewhere for absolutely no reason, but this is "good" design,
right ? :-P)

Nathann

Volker Braun

unread,
May 14, 2014, 6:40:18 AM5/14/14
to sage-...@googlegroups.com
On Wednesday, May 14, 2014 11:32:10 AM UTC+1, Nathann Cohen wrote:
Yes, but it duplicates the code horribly. Because orthogonal_array is a long sequence of tests and answers

Thats another red flag. If it doesn't fit on a screen break it up into separate function.
 
Besides, this is pretty clear :
sage: designs.orthogonal_array(4,3,existence=True)
True

Its terrible. Misleading function name (doesn't return an array), not discoverable. The fact that it also hard to cache is a direct corollary of attempting to overload by return type. Don't do it. 

Just because you can (in Python) doesn't mean that its a good idea.

Simon King

unread,
May 14, 2014, 6:41:50 AM5/14/14
to sage-...@googlegroups.com
Hi Nathann,

On 2014-05-14, Nathann Cohen <nathan...@gmail.com> wrote:
>> This basic design pattern also fixes your caching problem.
>
> Yes, but it duplicates the code horribly. Because orthogonal_array is a
> long sequence of tests and answers,and you do not want to find this long
> collection of tests and answers copied and pasted into three different
> functions, ...

... which means that you should probably turn this long collection of
tests and answers into a cached method, which is then called by the
other methods. So, no code duplication.

Best regards,
Simon


Nathann Cohen

unread,
May 14, 2014, 6:43:04 AM5/14/14
to Sage devel
> ... which means that you should probably turn this long collection of
> tests and answers into a cached method, which is then called by the
> other methods. So, no code duplication.

?

This is precisely the function I want to turn into a cached function.

Nathann

Nathann Cohen

unread,
May 14, 2014, 6:45:31 AM5/14/14
to Sage devel
> Thats another red flag. If it doesn't fit on a screen break it up into
> separate function.

Volker, you should read the code, and then we discuss it. I just can't code your conditionned reflexes.

> Its terrible. Misleading function name (doesn't return an array), not
> discoverable. 

Not "discoverable" ? What does it means ?
Are you actually trying to tell me that somebody who wants to build orthogonal arrays will not find a function named orthogonal_array ?

> The fact that it also hard to cache is a direct corollary of
> attempting to overload by return type. Don't do it.

Okay, then I just need to change the caching mechanism and this will justify my whole behaviour since the beginning.


> Just because you can (in Python) doesn't mean that its a good idea.

Just because you tell me I can't does not mean that it is a bad idea either.

Nathann

Volker Braun

unread,
May 14, 2014, 7:18:53 AM5/14/14
to sage-...@googlegroups.com
On Wednesday, May 14, 2014 11:45:31 AM UTC+1, Nathann Cohen wrote:
Volker, you should read the code, and then we discuss it.

I did, and I needed some eye bleach to wash out the 100-line long if/elif/elif/...  ;-)

> Just because you can (in Python) doesn't mean that its a good idea.
Just because you tell me I can't does not mean that it is a bad idea either.

But the fact that it interacts badly with decorators should be a hint that my argument is true nevertheless.

Nathann Cohen

unread,
May 14, 2014, 7:21:24 AM5/14/14
to Sage devel
> I did, and I needed some eye bleach to wash out the 100-line long if/elif/elif/... ;-)

Yeah I know.. But I swear that it comes from the Mathematics
behind.... Just look at the code of the steiner_quadruple_systems.py
file. That's the math talking, and what it says is not pretty.

> But the fact that it interacts badly with decorators should be a hint that my argument is true nevertheless.

Why should it be ? We write functions, we update them, we change them
... That's what we do here ! When something misses a feature, we add
it. That's what I wanted to do ...

Nathann

Nathann Cohen

unread,
May 14, 2014, 7:23:33 AM5/14/14
to Sage devel
> Yeah I know.. But I swear that it comes from the Mathematics
> behind.... Just look at the code of the steiner_quadruple_systems.py
> file. That's the math talking, and what it says is not pretty.

Or even the brand-new combinat/designs/database.py. It took me a week to write, and it probably took years of research before T_T

Nathann

Daniel Krenn

unread,
May 14, 2014, 7:31:01 AM5/14/14
to sage-...@googlegroups.com
Am 2014-05-14 11:52, schrieb Nathann Cohen:
> Now, it would be COOL if this function could cache its boolean/integer
> answers (for we ask the same questions very often), but caching the
> matrices is a really (really) bad idea.
>
> The return type is determined by an argument of orthogonal_array,
> existence=True, and we only want to cache answers to that.
>
> 1) Is there a way to obtain this behaviour with @cached_function ?
> 2) If not, do you think that implementing this is a nice idea ? (I am
> worried of making the standard use case slower)

+1 to implement filtering what is cached and what not (if it does not
slow down the code significantly; from which I think is not the case).

FYI: I have a similar problem at the moment: I performing an operation
on symbolic expressions. This is done recursively and I want to cache it
to improve the speed of my function. The problem is, that I ran out of
memory, since the input is sometimes very large, so I want to cache only
"smaller" parts. For me "smaller" is whenever operator!=operator.mul of
the symbolic input.
I wrote a workaround to delete out the not wanted elements of cache
directly (every couple of "iterations"), but this is not a very nice
solution. So I'd appreciate if cached_function can do that for me.

Maybe one could call the function then by:

@cached_function(filter=lambda s: s.operator() != operator.mul)
def my_function(s):
pass

Maybe there should be input_filter (=filter) and output_filter instead.

Daniel

leif

unread,
May 14, 2014, 9:15:28 AM5/14/14
to sage-...@googlegroups.com
Nathann Cohen wrote:
> Helloooooooo everybody !!!
>
> If you play with the designs.orthogonal_array(k,n) function, you will
> see that it returns three kinds of things :
>
> 1) A matrix (potentially big)
> 2) A boolean (telling you whether the matrix CAN be built)
> 3) An integer (telling you how large the matrix can be)
>
> Now, it would be COOL if this function could cache its boolean/integer
> answers (for we ask the same questions very often), but caching the
> matrices is a really (really) bad idea.
>
> The return type is determined by an argument of orthogonal_array,
> existence=True, and we only want to cache answers to that.
>
> 1) Is there a way to obtain this behaviour with @cached_function ?

Similar answer as to your other thread:

Wrap the function, calling a cached function (also just a wrapper) if
existence=True, the bare function otherwise. Here the overhead is
presumably negligible.

The drawback being that checking the existence can be as expensive as
constructing the resulting matrix (modulo perhaps using less memory),
and you might call both functions in sequence, but that's not different
to how it is now. The problem is that orthogonal_array() is not a
member function (nor does it return an object of a specific class),
otherwise you could keep some data in the object to be used on
subsequent calls.


-leif

--
() The ASCII Ribbon Campaign
/\ Help Cure HTML E-Mail

Nathann Cohen

unread,
May 14, 2014, 9:22:59 AM5/14/14
to Sage devel
Yooooooooooo !!

> Similar answer as to your other thread:
>
> Wrap the function, calling a cached function (also just a wrapper) if existence=True, the bare function otherwise. Here the overhead is presumably negligible.

Yesyes, but it really isn't pretty... :-/

It would be much better to have a filter in cached_method, really.

> The drawback being that checking the existence can be as expensive as constructing the resulting matrix (modulo perhaps using less memory),

Well, in our case we first check the existence (which calls
"existence" on other functions, and then others again, recursively)
and when we have identified a path that leads to True we use the same
path again to build the matrix. This way we avoid building useless
matrices.

> otherwise you could keep some data in the object to be used on subsequent calls.

Like caching stuff ?...

Nathann

Volker Braun

unread,
May 14, 2014, 9:51:07 AM5/14/14
to sage-...@googlegroups.com
On Wednesday, May 14, 2014 12:21:24 PM UTC+1, Nathann Cohen wrote:
> But the fact that it interacts badly with decorators should be a hint that my argument is true nevertheless.

Why should it be ? We write functions, we update them, we change them 

Well then go ahead and add a separate function for every distinct functionality. Then you don't need a hack to distinguish she way caching works. 

Nathann Cohen

unread,
May 14, 2014, 9:57:37 AM5/14/14
to Sage devel
> Well then go ahead and add a separate function for every distinct
> functionality. Then you don't need a hack to distinguish she way caching
> works.

Volker, we really have a LOT of places in the code which looks like that :

if case_1:
    if existence:
        return True
    else:
        do_stuff
if case_2:
    if existence:
        return True
    else:
        do_stuff
...

If we split the boolean/matrix answer, we have two functions :

def f1():
    if case_1:
        return True
    if case_2:
        return True

def f2():
    if case_1:
        do_stuff
    if case_2:
        do_stuff

The "if" can be long at times, there are MANY of them, and because one of the two functions would call the other I really don't want any mistake to be made in those "if" --> the two behaviours HAVE to match.

If I split it, we really have to duplicate a lot of tests, and this is not a good design either. I prefer to have one place where the logic is written, there will be less mistakes in the future.

Nathann

Volker Braun

unread,
May 14, 2014, 10:05:18 AM5/14/14
to sage-...@googlegroups.com
Implement the cases as functions:


def work_for_case_1():
    return long_computation_1()

def work_for_case_2():
    return long_computation_2()

def _select_case():
    if case1:
      return work_for_case_1
    elif case2:
      return work_for_case_2
    else:
       raise NotImplementedError

def orthogonal_array():
    return _select_case()()

def is_constructible():
    try:
        _select_case()
       return True
    except NotImplementedError:
        return False

Nathann Cohen

unread,
May 14, 2014, 10:12:16 AM5/14/14
to Sage devel
> Implement the cases as functions:

If I do that as in your example, then I have on one side the tests
corresponding to each construction, and on the other side the
functions corresponding to the tests. And they are really tied
together, same thing here, I don't want to see them miles apart.

At some point I did the tests inside of the function itself : from the
outside it was cool, you just called a lot of functions without doing
the tests yourself, and the function gave the answer or said "I can't
!".

Those functions, of course, returned both integers, booleans and matrices.

Plus Vincent didn't like to see the tests at the beginning of each function.

I'm a bit stuck here, you know ?.. :-P

Nathann

P.S. : Working on the cached_function patch

Volker Braun

unread,
May 14, 2014, 10:26:08 AM5/14/14
to sage-...@googlegroups.com
On Wednesday, May 14, 2014 3:12:16 PM UTC+1, Nathann Cohen wrote:
If I do that as in your example, then I have on one side the tests
corresponding to each construction, and on the other side the
functions corresponding to the tests

No. The tests of the work_for_case_x function go to that function. This ties the tests much better to the implementation than what you currently have. Which is just one giant docstring trying to cover all cases, where it is totally unclear whether everything is covered or not.


leif

unread,
May 14, 2014, 10:43:37 AM5/14/14
to sage-...@googlegroups.com
Nathann Cohen wrote:
>> Similar answer as to your other thread:
>>
>> Wrap the function, calling a cached function (also just a wrapper) if existence=True, the bare function otherwise. Here the overhead is presumably negligible.
>
> Yesyes, but it really isn't pretty... :-/
>
> It would be much better to have a filter in cached_method, really.
>
>> The drawback being that checking the existence can be as expensive as constructing the resulting matrix (modulo perhaps using less memory),
>
> Well, in our case we first check the existence (which calls
> "existence" on other functions, and then others again, recursively)
> and when we have identified a path that leads to True we use the same
> path again to build the matrix. This way we avoid building useless
> matrices.

So you need the caching because of potentially identical calls /in the
recursion/?

If, so, first build a call graph and do CSE on it... ;-)

Or, probably lazily evaluate the recursion tree.

Nathann Cohen

unread,
May 14, 2014, 10:45:06 AM5/14/14
to Sage devel
> No. The tests of the work_for_case_x function go to that function. This ties
> the tests much better to the implementation than what you currently have.

I agree with that and implemented it earlier but it was refused during a review.

Note that it changes nothing to my caching problem, however. I would then have to cache the individual constructions which return either boolean or matrices.

> Which is just one giant docstring trying to cover all cases, where it is
> totally unclear whether everything is covered or not.

All cases are not covered, because the theory does not know how to cover them :-P

Nathann

Nathann Cohen

unread,
May 14, 2014, 10:50:35 AM5/14/14
to Sage devel
> So you need the caching because of potentially identical calls /in the recursion/?

There *will* be double-calls :-P

> If, so, first build a call graph and do CSE on it... ;-)
>
> Or, probably lazily evaluate the recursion tree.

Come ooooooooon. This is exactly what cached functions are made to
solve. Let us not make it more complicated :-P

Nathann

Nathann Cohen

unread,
May 14, 2014, 10:58:45 AM5/14/14
to Sage devel, Daniel Krenn
"A cached function with selective memory"

http://trac.sagemath.org/ticket/16353

Nathann

Volker Braun

unread,
May 14, 2014, 11:07:56 AM5/14/14
to sage-...@googlegroups.com
On Wednesday, May 14, 2014 3:45:06 PM UTC+1, Nathann Cohen wrote:
Note that it changes nothing to my caching problem

No, it does fix your caching problem. You can either cache the individual work_for_case_x or the is_constructible function.

Just to be perfectly clear, the _select_case function does not perform any work! It only selects the function which would do the work if you were to call it. But since the function is returned unevaluated, no work is actually done. Only orthogonal_array evaluates, is_constructible does not evaluate.


Nathann Cohen

unread,
May 14, 2014, 11:12:41 AM5/14/14
to Sage devel
Hellooooo !

> No, it does fix your caching problem. You can either cache the individual
> work_for_case_x or the is_constructible function.
>
> Just to be perfectly clear, the _select_case function does not perform any
> work! It only selects the function which would do the work if you were to
> call it. But since the function is returned unevaluated, no work is actually
> done. Only orthogonal_array evaluates, is_constructible does not evaluate.

You mean this ?

def actual_construction(n):
stuff

@cached_function
def is_construction_possible(n):
if whatever:
return actual_construction #just the function
return False

Two functions for each construction, two docstring, two sets of
doctests, AND a function that returns either "False" or a function ?

Nathann

Volker Braun

unread,
May 14, 2014, 11:48:08 AM5/14/14
to sage-...@googlegroups.com
On Wednesday, May 14, 2014 4:12:41 PM UTC+1, Nathann Cohen wrote:
@cached_function 
def is_construction_possible(n):
    if whatever:
        return  actual_construction #just the  function

No, you haven't read what I wrote. In English, "is possible" asks for a boolean outcome, so is_possible() shall return a boolean.

Here is again what I originally wrote, maybe you'll read it this time:

Nathann Cohen

unread,
May 15, 2014, 3:11:31 AM5/15/14
to sage-...@googlegroups.com
No, you haven't read what I wrote. In English, "is possible" asks for a boolean outcome, so is_possible() shall return a boolean.

Here is again what I originally wrote, maybe you'll read it this time:

Oh. Well it does work, but if I need to introduce a specific wrapper function to be called by the user before the "actual" code I think I would use Nils' trick. If only to not forget it, because it is cool !

On the other hand the best for me would be that you would all come to your senses and accept the idea that filtering what you want to cache is a good idea. As it seems that you hate my design so much you are ready to say that it is useless, just because it helps me do what I want :-P

I would accept the argument of "speed issue", but of course you cannot use that one AND defend the "key" parameter, which has an easier workaround than my problem here and a higher cost. So it is both or none, see ? :-P

Nathann



def work_for_case_1():
    return long_computation_1()

def work_for_case_2():
    return long_computation_2()

def _select_case():
    if case1:
      return work_for_case_1
    elif case2:
      return work_for_case_2
    else:
       raise NotImplementedError

def orthogonal_array():
    return _select_case()()

def is_constructible():
    try:
        _select_case()
       return True
    except NotImplementedError:
        return False

--
You received this message because you are subscribed to a topic in the Google Groups "sage-devel" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sage-devel/OPe5VJpBiB4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sage-devel+...@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Volker Braun

unread,
May 15, 2014, 5:02:47 AM5/15/14
to sage-...@googlegroups.com
On Thursday, May 15, 2014 8:11:31 AM UTC+1, Nathann Cohen wrote:
On the other hand the best for me would be that you would all come to your senses and accept the idea that filtering what you want to cache is a good idea.

It is not. The problem is your 4 pages of if/else branches with interspersed computations. All other problems that you are encountering are a corollary of that one. 

Nathann Cohen

unread,
May 15, 2014, 5:06:34 AM5/15/14
to Sage devel
> It is not. The problem is your 4 pages of if/else branches with interspersed
> computations. All other problems that you are encountering are a corollary
> of that one.

How can you only cache the output of "small" values (the most used in any recursion) and not the larger ones ? That's the use case mentionned by Daniel, on this mailing-list.
And also by Jean-Baptiste on the trac ticket.

Nathann

Volker Braun

unread,
May 15, 2014, 5:44:00 AM5/15/14
to sage-...@googlegroups.com
How do you know what the user considers "small" and "large"? It depends on what the user is trying to do. The user can easily avoid caching by calling the .f attribute. 

And if you really need to decide whether to cache or not based on the arguments, you have to manage your own cache. Using some condititonal that you can pass to the memoization decorator is a spectacularly ugly way of managing your own cache. For starters, it doesn't allow you to make the caching decision on intermediate values of your computation.

Nathann Cohen

unread,
May 15, 2014, 5:51:36 AM5/15/14
to Sage devel
> How do you know what the user considers "small" and "large"?

In my case it is straightforward, but I could want to use this feature
is my own personal code anyway.

> And if you really need to decide whether to cache or not based on the arguments, you have to manage your own cache.

Well, I had to work on graphs with my own code before it was merged
into Sage, too.

> Using some condititonal that you can pass to the memoization decorator is a spectacularly ugly way of managing your own cache. For starters, it doesn't allow you to make the caching decision on intermediate values of your computation.

I don't know how to implement that with the caching mechanism.

Okay guys, it is 100 against one as always, you keep ignoring the fact
that the feature may be useful despite two messages coming from other
persons, you ignored my questions about the current "key" argument, so
I have no choice but to re-implement the feature in my own code, and
those who also need it will do the same.

I would have liked to see you all so fierce when I complained about
the combinatorial_map decorator.

Too bad !

Nathann

Nils Bruin

unread,
May 15, 2014, 11:57:28 AM5/15/14
to sage-...@googlegroups.com
On Thursday, May 15, 2014 12:11:31 AM UTC-7, Nathann Cohen wrote:
I would accept the argument of "speed issue", but of course you cannot use that one AND defend the "key" parameter, which has an easier workaround than my problem here and a higher cost. So it is both or none, see ? :-P

It's a potentially valid point that the introduction of the "key" parameter might have affected normal operation, but I just checked: it does not. At initialization time (i.e., when the decorator creates the wrapping object), the presence of the key argument is checked and if it's not there, the exact old code is executed, so if key is not there, it is guaranteed to not affect runtime performance in any way. *deciding* whether something needs to be cached will always be a runtime decision.

The justification is also a little different: "key" doesn't change the logical operation of the cache. It just provides a way to customize the normalization of the arguments (which always happens and I think people underestimate in how costly it can be: arguments to python function can look awfully complicated and to normalize them in a way that reflects how python binds them is a lot of work). I some doubts about its current implementation. Perhaps it should really just provide an *alternative* to the default argument_fixer.fix_to_pos. Especially the line

return self.key(*args, **dict(kwargs))

worries me a bit. (here `kwargs` is an already processed version of an incoming `**kwargs`!). I suspect that if this ever turns out to be a widely used feature, we'll have to have some changes that are incompatible in edge cases to arrive at a cleaner/more efficient design.

Nathann Cohen

unread,
May 15, 2014, 12:07:52 PM5/15/14
to Sage devel
Yoooooooo !!

> It's a potentially valid point that the introduction of the "key" parameter might have affected normal operation, but I just checked: it does not.

Hmmmm... I first thought that fix_to_pos was part of this "key" stuff,
but you seem to consider it something different. My thoughts were
"Okay, clearly this normalization is applied at every call, and what I
do is only a "if" applied when __call__ is called on a non-cached
value, so it has to be cheaper !".

If you don't count the "if" that triggers this normalization then it
does not stand anymore, and it looks like you have to apply this "if"
anyway because you cannot store a dictionary as a key of another
dictionary. Hmmmm...

> At initialization time (i.e., when the decorator creates the wrapping object), the presence of the key argument is checked and if it's not there, the exact old code is executed, so if key is not there, it is guaranteed to not affect runtime performance in any way. *deciding* whether something needs to be cached will always be a runtime decision.

Well, changing the key that is used to cache it is also a runtime
decision.... But it does not seem to add any "if" in the general case
indeed.

What worries me is that I will have to add a lot of dirty and totally
unrelated stuff to combinat/designs/ to get the behaviour I want T_T

Nathann

Javier López Peña

unread,
May 15, 2014, 12:21:09 PM5/15/14
to sage-...@googlegroups.com
If an method is *extremely* performance critical, one might want to give it its own custom cache to avoid as much overhead as possible. I don't think the utility cached_function and cached_method can be expected to have shining performance for all types of functions and inputs.

Also, if one needs really flexible caching then using the utility functions is probably not the way to go. When I need to cache in a sophisticated manner (eg caching only last N calls, or giving cache elements expiration times) I always resort to dogpile.cache [1] instead on the utility functions.

Cheers,
J

[1] https://bitbucket.org/zzzeek/dogpile.cache

Nathann Cohen

unread,
May 15, 2014, 12:27:54 PM5/15/14
to Sage devel
Helloooooooo !

> If an method is *extremely* performance critical, one might want to give it
> its own custom cache to avoid as much overhead as possible. I don't think
> the utility cached_function and cached_method can be expected to have
> shining performance for all types of functions and inputs.

You are right here, and using cached_methods would not have been an
end to our combinatorial design problems. But this function is called
so often on the same input that it just cannot work anymore without
caching, and so we will just cache everything for the moment.
Even if it means returning tuples instead of returning lists because
it cannot cache non-immutable stuff, which wouldn't have been a
problem with this additional feature as we wouldn't have cached this
output anyway.

It is just more complicated, and less efficient in the meantime.

Plus we have something like three different functions to cache in the same way.

Well. It would have been easier for us to only add this #16353 branch :-/

> Also, if one needs really flexible caching then using the utility functions
> is probably not the way to go. When I need to cache in a sophisticated
> manner (eg caching only last N calls, or giving cache elements expiration
> times) I always resort to dogpile.cache [1] instead on the utility
> functions.

HMmmmmm... Well, if I cannot add this feature to the caching mechanism
I will just write a specific caching for those functions... But
really, design code is not the place to implement caching
mechanisms...

Nathann

leif

unread,
May 15, 2014, 2:32:24 PM5/15/14
to sage-...@googlegroups.com
Nathann Cohen wrote:
> I would have liked to see you all so fierce when I complained about
> the combinatorial_map decorator.

Ah, the @grow_bushy_tail^TM decorator?

IIRC there were a couple of people complaining about that, but were more
or less ignored as well.


-leif

P.S.: Awaiting the @google_analytics decorator, probably already
present on ***
_______________________
(TM) of Dima

Nathann Cohen

unread,
May 15, 2014, 4:22:21 PM5/15/14
to Sage devel
> Ah, the @grow_bushy_tail^TM decorator?

Yes. That.

~/sage$ grep "@combinatorial_map" * -R | wc -l
93

> IIRC there were a couple of people complaining about that, but were more or
> less ignored as well.

Yep........

Nathann

Vincent Delecroix

unread,
May 15, 2014, 4:24:43 PM5/15/14
to sage-...@googlegroups.com
2014-05-15 22:22 UTC+02:00, Nathann Cohen <nathan...@gmail.com>:
Still time to pop it up again...

leif

unread,
May 15, 2014, 4:46:34 PM5/15/14
to sage-...@googlegroups.com
Nathann Cohen wrote:
>> Ah, the @grow_bushy_tail^TM decorator?
>
> Yes. That.
>
> ~/sage$ grep "@combinatorial_map" * -R | wc -l
> 93

To be fair:

$ grep -R "@combinatorial_map" src/sage/ --include \*.py --exclude
combinatorial_map.py | wc -l
84


-leif

Nathann Cohen

unread,
Jun 5, 2014, 11:13:01 AM6/5/14
to Sage devel
Hello guys !

I just pushed another version of the cached method decorator which
filters its input !

It is there and non-subversive, I just add a new class containing next
to nothing.

http://trac.sagemath.org/ticket/16353

Still, it does the job !

Nathann

kcrisman

unread,
Jun 6, 2014, 12:26:29 PM6/6/14
to sage-...@googlegroups.com
I didn't watch this thread before, but now I'm sorry I didn't.
 
> If
> designs.orthogonal_array doesn't return some kind of array then it is poorly
> named. Use three different functions, e.g. orthogonal_array() /
> is_constructible() / size_of_array()
>
> This basic design pattern also fixes your caching problem.

Yes, but it duplicates the code horribly. Because orthogonal_array is a long sequence of tests and answers,and you do not want to find this long collection of tests and answers copied and pasted into three different functions, nor to find out that "for some reason" they are not always consistent with each other.


See below - that is precisely when that code should be abstracted out.
 
More precisely :

> designs.orthogonal_array doesn't return some kind of array then it is poorly named

No. Because if you want to build orthogonal arrays, you *will* find a function named orthogonal_array. You may not find a function named size_of_maximal_orthogonal_array.


This is something fixable.  We deal with this in plotting and symbolics all the time.  Naming things twice (esp. since orthogonal_array isn't in the global namespace, just designs.[tab]) seems very reasonable to me.

designs.orthogonal_array_maximum_size
designs.orthogonal_array_existence

and then take all the code that could be repeated and put it in a 'private' (underscore) function.  Again, this is something that happens a lot in various other parts of Sage.  

(Also, in this particular case there are very few commands to even find by tab-completion, so people in such a specialized research area will likely find them.  But of course this won't always obtain as an argument.)

By the way, I love how nobody has a problem with

sage: designs.OrthogonalArray(4,3).exists()
True
sage: designs.OrthogonalArray(4,3).matrix()
BigMatrix


because these are different methods, so it makes sense.
 
while on the other hand

sage: designs.orthogonal_array(4,3,existence=True)
True
sage: designs.orthogonal_array(4,3)
BigMatrix


Usually, in Sage, we have kept to a convention that such keywords do things like try other algorithms or verify truth (as indeed in orthogonal_array in Sage 6.2, where I though I saw a check keyword of some kind).  Naturally, Python can do this, but it is helpful to keep things 

 
is "clearly bad design". Come on guys, we write Python code ! The point is not only to make our code dead slow, it is ALSO to use the advantages of the language. Stop writing Java and C wrapped in Python ! Do you complain when "wc -l" returns an integer and "wc" returns three integers and a filename ? Do you prefer wc.complete_info() and wc.just_the_number_of_lines() ?


Shell isn't Python.  In fact, you are losing the point of object-orientedness.  Again, I'm sure Sage also does this in some places, but over time it would be good to minimize those places.
++++++
Anyway, since all this new stuff is still in beta, is there a possibility of refactoring this and making a few methods?  (And ones that also don't use "who_asked", a purely internal thing - that should be some internal attribute, not a keyword.)  

I really really really really (!) understand that it would be a very annoying piece of work to do so.  And I know it is, if not orthogonal, at least tangential to the question of cache functions.  But it would be so so so good for Sage in the long term.  I know I am no expert software engineer and make lots of dumb design choices, and of course much of the combinat code is largely not going to be used by me or my students or people I teach Sage... but on the other hand if we have the chance to tighten things up, it seems reasonable to take the opportunity.

- kcrisman

Nathann Cohen

unread,
Jun 6, 2014, 12:47:08 PM6/6/14
to Sage devel
> This is something fixable.  We deal with this in plotting and symbolics all
> the time.  Naming things twice (esp. since orthogonal_array isn't in the
> global namespace, just designs.[tab]) seems very reasonable to me.
>
> designs.orthogonal_array_maximum_size
> designs.orthogonal_array_existence

Okay let's be reasonable. The functions I would need are :

designs.orthogonal_array_maximum_size # note that the terminology is not right
designs.orthogonal_array_existence
designs.orthogonal_array
designs.transversal_design_maximum_size # same comment
designs.transversal_design_existence
designs.transversal_design
designs.mutually_orthogonal_latin_squares_maximum_size # same comment
designs.mutually_orthogonal_existence
designs.mutually_orthogonal

Of course, I would also needs 3 hidden function that would be called by the others

_orthogonal_array_all_at_once
_transversal_design_all_at_once
_mutually_orthogonal_latin_squares_all_at_once

Note that I would also need to triple MANY functions in the graph code, for the very reason that you insist on writing Python code as if it were C code.


> Usually, in Sage, we have kept to a convention that such keywords do things
> like try other algorithms or verify truth (as indeed in orthogonal_array in
> Sage 6.2, where I though I saw a check keyword of some kind).  Naturally,
> Python can do this, but it is helpful to keep things

I have been doing this in the Graph code for years. And I think that I remember how it began :

It was the early days of Linear Programming. I was writing functions to solve NP hard stuff, returning partitions of vertices or list of vertices, and though that perhaps I could save this with a "value_only" flag. When a graph function has such a flag, it means that it will not return the actual list of vertices, but its size instead.

This, because in the optimization world, we deal with existence problems. And existence problems have what is called a "certificate", ie "some information which can be used to check, in polynomial time, that the answer given to the decision problem is right".

Thus in the graph code you will have functions with a "certificate" flag, others with a "value_only" flag. And their output changes according to that.

Python is *NOT* typed. Accept it.

> Shell isn't Python.  In fact, you are losing the point of
> object-orientedness.  Again, I'm sure Sage also does this in some places,
> but over time it would be good to minimize those places.

I think, on the other hand, that people are so used to object-oriented programming that they do not see when it is not a good idea anymore.

I don't want to have to create objects that will be only used to access methods.
If I create those objects, I will need to write a OrthogonalArray class, a TransversalDesign class, and these objects (which, currently, are lists of lists) will HAVE to be immutable.
If these objects are made immutable, I will NEED to copy them again and again and again in all recursive constructions instead of changing them.

Those are all the changes that are triggered if I have to use those cursed objects where they don't belong.

> Anyway, since all this new stuff is still in beta, is there a possibility of
> refactoring this and making a few methods?

I have a lot of modification to do on these files, a lot of nice stuff to add, but there are currently 9 patches in needs_review in the combinatorial_designs section. I cannot add code not improve anything unless the patches end up being reviewed.

>  (And ones that also don't use
> "who_asked", a purely internal thing - that should be some internal
> attribute, not a keyword.)  

This keyword is used to break infinite recursions. If you see a way around I am interested.

> I really really really really (!) understand that it would be a very
> annoying piece of work to do so. 

I don't mind the work if it makes sense. But having to pick a worse data structure and having to generate useless objects just because you cannot stand a syntax that the language allows really is bad work.

> And I know it is, if not orthogonal, at
> least tangential to the question of cache functions.  But it would be so so
> so good for Sage in the long term.  I know I am no expert software engineer
> and make lots of dumb design choices, and of course much of the combinat
> code is largely not going to be used by me or my students or people I teach
> Sage...

Please don't consider sage/combinat/designs/ as part of the combinat/ code :-P

> but on the other hand if we have the chance to tighten things up, it
> seems reasonable to take the opportunity.

I have been walking in circles for days or weeks because the patches are waiting for a review. I cannot write code for as long as there is stuff still left to be reviwed.

Nathann

Nathann Cohen

unread,
Jun 6, 2014, 1:05:53 PM6/6/14
to Sage devel
Okay, I'm tired of this.

As soon as all the current patches are reviewed I will turn all this code into socially acceptable Object-Oriented crap. It will be bad work, bad decisions taken for bad reasons, but this will let me add actual mathematical code to Sage later on while I cannot do this if every single tickets takes days of fights.

You will be glad.

There will be a Parent named "OrthogonalArrays", another named "TransversalDesigns", another named "MutuallyOthogonalLatinSquares". There will be Elements, i.e. the actual orthogonal arrays, transversal designs and mutually orthogonal latin squares.

They will all inherit the following totally useless (and broken) methods :
sage: 
sage: Element.
Element.N                 Element.category          Element.dumps             Element.n                 Element.rename            Element.subs              
Element.base_extend       Element.db                Element.is_zero           Element.numerical_approx  Element.reset_name        Element.substitute        
Element.base_ring         Element.dump              Element.mro               Element.parent            Element.save              Element.version      

There will also be a parent for all "Orthogonal Arrays of parameters k=?,n=?". It will have a .an_element() method to return an actual design. and a is_nonempty() function to know if it contains anything at all.

I don't know yet how to implement the "largest k such that there exists a OA(k,n)" but I am pretty sure that a quite non-obvious OrthogonalArrays(n=4).largest_k() would do the trick.

Of course, the actual orthogonal arrays, transversal designs, will need a data structure. It will be immutable and as predicted above it will require useless copies. 

It will be FANTASTIC. Hundreds if not thousands of lines of code for absolutely NO added feature. Objects creates and deallocated for absolutely no point, possible memory leaks like the "cached method that returns UniqueRepresentation objects" if I can manage it.

But it will be socially acceptable. It will be reviewed immediately because it is what everybody accepts without even discussing the design.
I can't wait to be socially acceptable. 

Nathann

Vincent Delecroix

unread,
Jun 6, 2014, 1:35:01 PM6/6/14
to sage-...@googlegroups.com
Hello,

I think that I was the main reviewer of the module on
orthogonal_array, so I take part in responsability! But on the other
hand I know the code quite well and can answer to some of the design
decisions.

It is still largely possible to refactor the code, especially the
functions that are available from the global namespace. And it can be
done easily as it is just about modifying "design_catalog.py" which is
rather independent from the coming tickets.

For the internals, many constructions in combinatorial designs are
recursive and complicated. So separating the existence from
construction means do an exact copy of the code. The boolean answer is
just intended to cut the actual construction (the object are rather
big). So we can not live with two functions. On the other hand, this
function can be hidden from the global namespace.

Nevertheless, there is an intermediate solution which I proposed
recently in #16347 comment:12. Let me repeat it here. Instead of
returning an OA or a boolean we can write a function
"orthogonal_array_construction(k, n)" whose specification would be
"return a pair (f, args) such that f(*args) is an OA(k,n) or return
None if Sage has no idea how to build it". That way, even the internal
function would be clean.

For the external, as Nathann already said, multiplying the namespace with
designs.orthogonal_array(k, n)
designs.orthogonal_array_existence(k, n)
designs.orthogonal_array_maximum_size(n)
looks ugly to me as there are many combinatorial designs on which you
would have the same. It might also be gathered with no cost as
designs.orthogonal_array.SOMETHING
Perhaps better, I am not sure.

All best,
Vincent

Volker Braun

unread,
Jun 6, 2014, 2:04:07 PM6/6/14
to sage-...@googlegroups.com
On Friday, June 6, 2014 6:35:01 PM UTC+1, vdelecroix wrote:
Nevertheless, there is an intermediate solution which I proposed
recently in #16347 comment:12. Let me repeat it here. Instead of
returning an OA or a boolean we can write a function
"orthogonal_array_construction(k, n)" whose specification would be
"return a pair (f, args) such that f(*args) is an OA(k,n) or return
None if Sage has no idea how to build it". That way, even the internal
function would be clean.
 

You could also wrap the args in a closure, then you don't have to pass them around.

For the external, as Nathann already said, multiplying the namespace with
 designs.orthogonal_array(k, n)
 designs.orthogonal_array_existence(k, n)
 designs.orthogonal_array_maximum_size(n)
looks ugly to me as there are many combinatorial designs on which you
would have the same.

How about is_orthogonal_array_implemented(k,n) instead of the passive. Also, since Nathan said that orthogonal_array_maximum_size is not the correct nomenclature there might be a better name as well.

kcrisman

unread,
Jun 6, 2014, 8:48:17 PM6/6/14
to sage-...@googlegroups.com
Note that I would also need to triple MANY functions in the graph code, for the very reason that you insist on writing Python code as if it were C code.


Interesting, there are things like that in the optimization-based graph code too?
 
I have been doing this in the Graph code for years. And I think that I remember how it began :

It was the early days of Linear Programming. I was writing functions to solve NP hard stuff, returning partitions of vertices or list of vertices, and though that perhaps I could save this with a "value_only" flag. When a graph function has such a flag, it means that it will not return the actual list of vertices, but its size instead.

This, because in the optimization world, we deal with existence problems. And existence problems have what is called a "certificate", ie "some information which can be used to check, in polynomial time, that the answer given to the decision problem is right".

Thus in the graph code you will have functions with a "certificate" flag, others with a "value_only" flag. And their output changes according to that.

Python is *NOT* typed. Accept it.


I am very glad Python is dynamically typed!  For me, it has nothing to do with type, and everything to do with not confusing end users.  Especially because it seems (but I may have misunderstood some comments in the vast trove of emails and tickets) that one could get different output types without the use of special 'certificate' keywords.

I don't want to have to create objects that will be only used to access methods.
If I create those objects, I will need to write a OrthogonalArray class, a TransversalDesign class, and these objects (which, currently, are lists of lists) will HAVE to be immutable.

Actually, I was kind of surprised that they were just lists of lists when I was looking at the code.  Surely one wants more structure than that, or don't they have more structure?  This is exactly the kind of thing we are just talking about with the game theory code - we want a class because you want to *do* something with it.  Tab-completion rules!  But I suppose it's possible that all these designs really are just lists of lists.

Those are all the changes that are triggered if I have to use those cursed objects where they don't belong.


I guess I don't know why they have to be immutable since not every class is immutable (is it?) but I will trust you on this.

>  (And ones that also don't use
> "who_asked", a purely internal thing - that should be some internal
> attribute, not a keyword.)  

This keyword is used to break infinite recursions. If you see a way around I am interested.


So there isn't some attribute it can set, like self._who_asked, to check this, with the same purpose?  Again, just asking, maybe the only way to break them is a keyword.  But it does look quite awkward; such things really shouldn't be exposed in the API.
 
I don't mind the work if it makes sense. But having to pick a worse data structure and having to generate useless objects just because you cannot stand a syntax that the language allows really is bad work.


I don't know that I can't stand it, but just that Sage has typically not done this, so as to have an easier-to-parse API.  Certainly there are always exceptions, since we also (as you say) want to release stuff.
 
Please don't consider sage/combinat/designs/ as part of the combinat/ code :-P


I just went by directory structure :)

> It is still largely possible to refactor the code, especially the 
> functions that are available from the global namespace. And it can be 
> done easily as it is just about modifying "design_catalog.py" which is 
> rather independent from the coming tickets. 


> For the internals, many constructions in combinatorial designs are 
> recursive and complicated. So separating the existence from 
> construction means do an exact copy of the code. The boolean answer is 
> just intended to cut the actual construction (the object are rather 
> big). So we can not live with two functions. On the other hand, this 
> function can be hidden from the global namespace. 

I guess I'm just surprised you need two functions; why can't you have one (underscore?) function that both of them call?  That is what was surprising to me, if it really is an *exact* copy of the code?

> For the external, as Nathann already said, multiplying the namespace with 
> designs.orthogonal_array(k, n) 
> designs.orthogonal_array_existence(k, n) 
> designs.orthogonal_array_maximum_size(n) 
> looks ugly to me as there are many combinatorial designs on which you 
> would have the same. It might also be gathered with no cost as 
> designs.orthogonal_array.SOMETHING 
> Perhaps better, I am not sure. 

Maybe.  I guess I see an awful lot of such extra namespace stuff in many other parts of Sage, so it doesn't frighten me :)

> But it will be socially acceptable. It will be reviewed immediately because it is what everybody accepts without even discussing the design.

Let me put it this way.  There is a lot of stuff I wrote five years ago for a paper that just finally got published (or will in September).  It all looks just like this - very convenient, avoided lots of repetition, returns lists of lists (truly!), etc.  And I know it isn't ready to include in Sage for that very reason.  A thought experiment - what is the mission statement?  

I guess if similar functionality in other software also returns multiple types of things depending on keywords, then maybe that's just how it has to be.  As I hope is clear, I don't really care if this is what happens, personally; but I really don't want somebody coming back ten years from now and complaining about the design now that Sage is a mature project.  I can think of several places where very early decisions were made, to get something off the ground, that still are annoying today for end users; that is okay, because having Sage is way better than not having it :) but we are in a different place now where we can afford to at least think about this.

> I can't wait to be socially acceptable. 
At least for me, it's not about that, though I understand why you feel that way.  And I'm certainly not suggesting you have to inherit from Element, which I don't understand anyway!

Thanks both Nathann and Vincent for explaining some of the more technical details.  

Vincent Delecroix

unread,
Jun 7, 2014, 3:14:41 AM6/7/14
to sage-...@googlegroups.com
> Actually, I was kind of surprised that they were just lists of lists when I
> was looking at the code. Surely one wants more structure than that, or
> don't they have more structure? This is exactly the kind of thing we are
> just talking about with the game theory code - we want a class because you
> want to *do* something with it. Tab-completion rules! But I suppose it's
> possible that all these designs really are just lists of lists.

Yes, a combinatorial design is basically a set of subsets. The current
code of combinatorial design is intended to do lengthy computation for
constructing them. It is already a lot of time to run the code.
Involving some badly designed class would be at the moment the worse
thing to do. As an example

sage: timeit("for s in Subsets(range(5)): pass") # return Set
125 loops, best of 3: 3.97 ms per loop
sage: timeit("for s in subsets(range(5)): pass") # return list
625 loops, best of 3: 53.3 µs per loop

I am not saying that we do not need a proper class (note: there is
already IncidenceStructure). There are many things that one would like
to do with an orthogonal array:
- getting various graphs from it
- computing automorphism groups
- comparing isomorphism with another one
- looking for substructures
- etc

But right now, what Nathann is trying to do is rather orthogonal to
these questions. If somebody comes with a nice class that *does not*
slow down his construction code, everybody will be happy. Sage is open
to contributions and not only to critics ;-P

>> It is still largely possible to refactor the code, especially the
>> functions that are available from the global namespace. And it can be
>> done easily as it is just about modifying "design_catalog.py" which is
>> rather independent from the coming tickets.
>
>
>> For the internals, many constructions in combinatorial designs are
>> recursive and complicated. So separating the existence from
>> construction means do an exact copy of the code. The boolean answer is
>> just intended to cut the actual construction (the object are rather
>> big). So we can not live with two functions. On the other hand, this
>> function can be hidden from the global namespace.
>
> I guess I'm just surprised you need two functions; why can't you have one
> (underscore?) function that both of them call? That is what was surprising
> to me, if it really is an *exact* copy of the code?

more precisely:
{{{
if construction_X_is_possible(k,n):
if existence: return True
else: return build_the_design_with_X(k,n) # long time
}}}
But see Volker proposal about a function which returns None or functions as in
{{{
if construction_X_is_possible(k,n):
return build_the_design_with_X
}}}

> Thanks both Nathann and Vincent for explaining some of the more technical
> details.

Thanks for having a look into this.

Vincent

Volker Braun

unread,
Jun 7, 2014, 7:46:44 AM6/7/14
to sage-...@googlegroups.com
There are many places where Sage provides a nice frontend class while the actual computation is implemented differently. Of course you could just use libsingular directly to compute with polynomial, right? ;-)

In particular, it would be perfectly reasonable to have a "frontend" class that acts as a parent (like Set) and deals with the whole translation of arbitrary symbols into range(n) and non-time-critical constructions. Then a specialized "backend" code just works with lists of lists of integers, say. And once you find that the backend isn't fast enough you can switch to C++ and use std::set, say. Without having to deprecate the user interface.

kcrisman

unread,
Jun 7, 2014, 9:19:21 PM6/7/14
to sage-...@googlegroups.com
But right now, what Nathann is trying to do is rather orthogonal to
these questions. If somebody comes with a nice class that *does not*
slow down his construction code, everybody will be happy. Sage is open
to contributions and not only to critics ;-P
 
Thanks for having a look into this. 


Yeah, on both of those counts wish I knew enough about the actual structures to have even more constructive comments.  Good luck! 
Reply all
Reply to author
Forward
0 new messages