Reaction, garbage collection, performance and non-CPython implementations

10 views
Skip to first unread message

Sergey Schetinin

unread,
Dec 22, 2009, 12:31:28 PM12/22/09
to better-python
So Reaction uses a dictionary to store snapshot data.
The keys correspond to cells, the values to values of those cells.
The cells get created and abandoned all the time, so their values have
to be DECREF'd as well.

1. One way to do it is to make the snapshot dicts instances WeakKeyDictionary.
The problem is that it's slow.
The snapshot is being used A LOT and WeakKeyDictionary is implemented
in pure Python.
The overhead it adds could seem small, but when compared to how fast
regular dict.__getitem__ is, it's catastrophic.

2. That's why we aren't using WeakKeyDictionary, we use regular dict.
But to make sure garbage collection works I can't store references to
the cells in the snapshot dict.
That is, cells can't be the snaphot keys.
To work around that, every cell that stores any data in a snapshot[1]
has a .key attribute holding the key that cell uses to access its data
in the snapshot.
That key has a special type CellKey and instances are 1:1 mapped to the cells.
The CellKey is stores a weak reference to the cell, so the cell can be
garbage collected.
That weakref has a callback, which is called once the cell is being collected.
That callback makes sure to remove the data stored for that cell from
all snapshots. [2]
That implies we need to have a definitive list of live snapshots. [3]
That list is maintained by SnapshotController baseclass [4] which
needs to do its own refcounting for those snapshots.

[1] Resetting cells don't store anything in snapshot -- they know
their reset value and their transient value does not get into the
snapshot.
[2] http://code.google.com/p/trellis-fork/source/browse/trunk/mext/reaction/celltypes/base.py#20
[3] There could be more than one live snapshot at a time due to
concurrent transactions.
[4] http://code.google.com/p/trellis-fork/source/browse/trunk/mext/reaction/stm/controllers.py#84

That's sounds like a lot of stuff, but I actually implemented with
reasonably little code and it isn't hard to grok once you understand
the rationale.
The speed is alright as well -- way better than WeakKeyDictionary.

There's a problem though and I mentioned it before [5] -- cycles still
cannot be garbage collected.
Cycle detector walks the graph of object references and as far as it's
concerned it's not the cell that references the value, it's the
snapshot.
So I wonder if it's even possible to create an extension that would
make CPython cycle detector realize that the reference belongs to the
cell.
Also, I'd think that if this is possible, it would be a CPython-only
fix and then the issue would be if it could be made in such a way that
Reaction would still work without it and would not end up being
restricted to CPython.

[5] http://groups.google.com/group/better-python/browse_thread/thread/9af8c641589795e9?hl=en#

--
Best Regards,
Sergey Schetinin

http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter

Sergey Schetinin

unread,
Jan 30, 2010, 11:54:50 PM1/30/10
to better-python
I've managed to implement cyclic garbage collection without any extensions.
http://code.google.com/p/trellis-fork/source/detail?r=373

To collect cycles you have to call ctrl.collect_cycles(), it is not
done automatically.

Reply all
Reply to author
Forward
0 new messages