Smarter cache

123 views
Skip to first unread message

Denis Akhiyarov

unread,
Aug 20, 2015, 12:23:57 AM8/20/15
to sympy
Once I had some issues with cache in sympy and recently noticed this nice idea cachey. Seems to be a perfect fit for sympy? In short the idea is to cache based on size and computation effort using a simple formula.

http://matthewrocklin.com/blog/work/2015/08/03/Caching/

The goal is to keep the cache of limited size, but pick/keep the cached items wisely.

score+=compute_time/nbytes*(1+e)^t

Peter Brady

unread,
Aug 21, 2015, 11:08:19 AM8/21/15
to sympy
As of 0.7.6 sympy uses an LRU cache by default.  cachey looks interesting but I think the "nbytes" may be challenging to compute for sympy objects and operations. 

There was some discussion of adopting a similar caching policy a while ago: https://github.com/sympy/sympy/issues/6321

Denis Akhiyarov

unread,
Aug 24, 2015, 11:03:30 PM8/24/15
to sympy
Nbytes is very hard in Python, and getsizeof() does not work very well. People has addressed this using github.com/pympler.
Not sure if anyone tried it on sympy objects and how costly is that calculation. Cachey has very simple nbytes calculation, mainly intended for numpy and pandas objects.
Message has been deleted
Message has been deleted

Denis Akhiyarov

unread,
Aug 25, 2015, 12:49:43 AM8/25/15
to sympy
It looks like pympler works pretty well on sympy symbols, here is my notebook:

https://gist.github.com/denfromufa/4d0e6a94f70fac155b66

Peter Brady

unread,
Aug 25, 2015, 12:38:45 PM8/25/15
to sy...@googlegroups.com
Thanks for trying that out.  I had never heard of pympler before.  The current caching mechanism is based on hashing.  By my tests, 'pympler.asizeof' is 500-1000x slower than hashing.  That's a strong deficit for cachey to overcome (as far as sympy objects are concerned).

In [1]: import sympy

In [2]: from sympy.abc import a, b, c, d, e, x, y

In [3]: from pympler.asizeof import asizeof

In [4]: y=a*x**3+b*x**2+c*x+d

In [5]: y1, y2, y3 = sympy.solve(y, x, check=False)

In [6]: %time asizeof(y1)
CPU times: user 9.63 ms, sys: 0 ns, total: 9.63 ms
Wall time: 9.56 ms
Out[6]: 52608

In [7]: %time hash(y1)
CPU times: user 14 µs, sys: 1 µs, total: 15 µs
Wall time: 19.8 µs
Out[7]: 5743556980832125790

In [8]: y=a*x**4+b*x**3+c*x**2+d*x+e

In [9]: y1,y2,y3,y4=sympy.solve(y,x,check=False)

In [10]: %time asizeof(y4)
CPU times: user 16.7 ms, sys: 2.05 ms, total: 18.8 ms
Wall time: 18.6 ms
Out[10]: 85208

In [11]: %time hash(y4)
CPU times: user 14 µs, sys: 1 µs, total: 15 µs
Wall time: 19.8 µs
Out[11]: 4388441583750016728


--
You received this message because you are subscribed to a topic in the Google Groups "sympy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sympy/slKi02rzXVE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/6e7f1a64-e8b9-46b9-9dd8-28f18de3a416%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Denis Akhiyarov

unread,
Aug 25, 2015, 7:12:27 PM8/25/15
to sympy
pympler is very slow, hash is probably pure C, like fastcache.

But it is understandable why it can get slow for collecting all this information in Python:

asizeof(y1,stats=8)

asizeof(((c/(3*a) - b**2/(9*a**2))/(sqrt((c/(3....) + b**3/(27*a**3))**(1/3) - b/(3*a),), stats=8) ...
 52136 bytes or 50.9 KiB
     8 byte aligned
     8 byte sizeof(void*)
     1 object given
   222 objects sized
  1840 objects seen
    24 recursion depth

    15 profiles:  total (% of grand total), average, and largest flat size:  largest object
    42 class sympy.core.assumptions.StdFactKB objects:  42048 or 41.1 KiB (81%), 1001, 1632 or 1.6 KiB:  {'prime': False, 'infinite': False, 'r....maginary': False, 'irrational': False} leng 32!
    44 class str objects:  2632 or 2.6 KiB (5%), 59, 64:  'infinite' leng 9!
    28 class tuple objects:  1888 or 1.8 KiB (4%), 67, 80:  (sqrt((c/(3*a) - b**2/(9*a**2))**3 + (..../(2*a), b**3/(27*a**3), -b*c/(6*a**2)) leng 4
    49 class int objects:  1848 or 1.8 KiB (4%), 37, 40:  5976377932654160047 leng 2!
    12 class sympy.core.mul.Mul objects:  864 (2%), 72, 72:  -(sqrt((c/(3*a) - b**2/(9*a**2))**3 + .... b*c/(6*a**2) + b**3/(27*a**3))**(1/3)
    10 class sympy.core.power.Pow objects:  720 (1%), 72, 72:  (sqrt((c/(3*a) - b**2/(9*a**2))**3 + (.... b*c/(6*a**2) + b**3/(27*a**3))**(1/3)
     7 class sympy.core.numbers.Rational objects:  560 (1%), 80, 80:  -1/9
     5 class sympy.core.add.Add objects:  360 (1%), 72, 72:  (c/(3*a) - b**2/(9*a**2))/(sqrt((c/(3*....*2) + b**3/(27*a**3))**(1/3) - b/(3*a)
     4 class sympy.core.numbers.Integer objects:  352 (1%), 88, 88:  -2
     4 class pympler.asizeof._Slots objects:  336 (1%), 84, 88:  ('p', 'q', '_assumptions', '_args', '_mhash') leng 4
     4 class sympy.core.symbol.Symbol objects:  288 (1%), 72, 72:  a
     1 class sympy.core.numbers.NegativeOne object:  88 (0%), 88, 88:  -1
     1 class sympy.core.numbers.Half object:  80 (0%), 80, 80:  1/2
     2 class bool objects:  56 (0%), 28, 32:  True
     1 class NoneType object:  16 (0%), 16, 16:  None

    42 static types:  basicsize, itemsize, _len_(), _refs()
       class Exception:  88, 0, n/a, _exc_refs
       class NoneType:  16, 0, n/a, n/a
       class NotImplementedType:  16, 0, n/a, n/a
       class Struct:  56, 1, _len_struct, n/a
       class array.array:  64, 1, _len_array, n/a
       class bool:  32, 4, n/a, n/a
       class bytearray:  56, 1, _len_bytearray, n/a
       class bytearray_iterator:  56, 0, _len_iter, _iter_refs
       class callable_iterator:  56, 0, _len_iter, _iter_refs
       class complex:  32, 0, n/a, n/a
       class dict:  64, 24, _len_dict, _dict_refs
       class dict_itemiterator:  80, 0, _len_iter, _iter_refs
       class dict_keyiterator:  80, 0, _len_iter, _iter_refs
       class dict_valueiterator:  80, 0, _len_iter, _iter_refs
       class ellipsis:  16, 0, n/a, n/a
       class enumerate:  72, 0, n/a, _enum_refs
       class float:  24, 0, n/a, n/a
       class frozenset:  224, 16, _len_set, _seq_refs
       class getset_descriptor:  72, 0, n/a, n/a
       class int:  24, 4, _len_int, n/a
       class list:  64, 8, _len_list, _seq_refs
       class list_iterator:  56, 0, _len_iter, _iter_refs
       class list_reverseiterator:  56, 0, _len_iter, _iter_refs
       class mappingproxy:  48, 24, _len_dict, _dict_refs
       class member_descriptor:  72, 0, n/a, n/a
       class module:  88, 48, _len_module, _module_refs
       class os.stat_result:  48, 8, n/a, _stat_refs
       class property:  80, 0, n/a, _prop_refs
       class pympler.asizeof._Slots:  56, 8, _len_slots, n/a
       class range:  48, 0, n/a, n/a
       class reversed:  56, 0, n/a, _enum_refs
       class set:  224, 16, _len_set, _seq_refs
       class set_iterator:  72, 0, _len_iter, _iter_refs
       class slice:  40, 8, _len_slice, n/a
       class str:  80, 2, _len_unicode, n/a
       class str_iterator:  56, 0, _len_iter, _iter_refs
       class traceback:  64, 0, n/a, _tb_refs
       class tuple:  48, 8, _len, _seq_refs
       class tuple_iterator:  56, 0, _len_iter, _iter_refs
       class weakproxy:  80, 0, n/a, n/a
       class weakref:  80, 0, n/a, _weak_refs
       class weakref.KeyedRef:  88, 0, n/a, _weak_refs

     8 dynamic types:  basicsize, itemsize, _len_(), _refs()
       class sympy.core.add.Add:  72, 0, n/a, _inst_refs
       class sympy.core.mul.Mul:  72, 0, n/a, _inst_refs
       class sympy.core.numbers.Half:  80, 0, n/a, _inst_refs
       class sympy.core.numbers.Integer:  88, 0, n/a, _inst_refs
       class sympy.core.numbers.NegativeOne:  88, 0, n/a, _inst_refs
       class sympy.core.numbers.Rational:  80, 0, n/a, _inst_refs
       class sympy.core.power.Pow:  72, 0, n/a, _inst_refs
       class sympy.core.symbol.Symbol:  72, 0, n/a, _inst_refs

     1 derived type:  basicsize, itemsize, _len_(), _refs()
       class sympy.core.assumptions.StdFactKB:  64, 24, _len_dict, _dict_refs

     4 dict/-like classes:
       UserDict:  (IterableUserDict, UserDict)
       weakref:  (WeakKeyDictionary, WeakValueDictionary)

Aaron Meurer

unread,
Aug 25, 2015, 8:50:43 PM8/25/15
to sy...@googlegroups.com
Hashing in SymPy is done recursively (due to the nature of SymPy objects), but amounts to hashes of tuples of integers and strings, which is done in C. But it's also highly optimized: the hash is memoized and stored in __slots__. 

If we really cared about sizes of objects, we could probably do a similar thing. And it is probably sufficient to use heuristics rather than a true sizeof.

Aaron Meurer

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.

Peter Brady

unread,
Aug 25, 2015, 11:23:51 PM8/25/15
to sy...@googlegroups.com
That's a good point.  I had forgotten that sympy optimized the hash implementations.

Denis Akhiyarov

unread,
Aug 26, 2015, 1:51:54 PM8/26/15
to sympy
what is the heuristic? number of **Basic** sympy objects?

Aaron Meurer

unread,
Aug 26, 2015, 2:42:40 PM8/26/15
to sy...@googlegroups.com
Probably count_ops() would be a close approximation of both how expensive an object is to create and how big it is (SymPy objects really shouldn't be doing much computation at creation time). 

Aaron Meurer

Denis Akhiyarov

unread,
Aug 26, 2015, 3:33:44 PM8/26/15
to sympy
1. regarding count_ops, are we now jumping to computation cost? :)

2. if size of sympy objects is proportional to computation cost involving them, then cachey does not make sense for sympy at all.

3. not sure if computation cost should just be tracked using time() function?

4. i think it is possible to override __sizeof__ just like the __hash__ function in sympy objects.

Aaron Meurer

unread,
Aug 26, 2015, 4:57:48 PM8/26/15
to sy...@googlegroups.com
The whole point of the cache is to speed things up.

Aaron Meurer

Denis Akhiyarov

unread,
Aug 26, 2015, 6:18:40 PM8/26/15
to sympy
i agree, but cachey needs 2 main parameters as input: nbytes and computation cost.

so using count_ops for both parameters is like reducing the formula down to LRU or something similar.

in conclusion:

* counts_ops or time() can be used for computation cost.

** __sizeof__ or number of Basic sympy objects for memory cost.

Aaron Meurer

unread,
Aug 26, 2015, 7:19:14 PM8/26/15
to sy...@googlegroups.com
On Wed, Aug 26, 2015 at 5:18 PM, Denis Akhiyarov <denis.a...@gmail.com> wrote:
i agree, but cachey needs 2 main parameters as input: nbytes and computation cost.

so using count_ops for both parameters is like reducing the formula down to LRU or something similar.

I think you're right. Cachey would not really be that much of an improvement over LRU, at least using the formula that it uses, at least for the case of caching SymPy objects directly (caching the output of functions that compute SymPy objects may be different).
 
Aaron Meurer

Reply all
Reply to author
Forward
0 new messages