[Python-Dev] PEP 454 (tracemalloc) disable ==> clear?

Jim Jewett

unread,

Oct 28, 2013, 10:00:34 PM10/28/13

to Victor Stinner, Python dev

reset() function:

Clear traces of memory blocks allocated by Python.

Does this do anything besides clear? If not, why not just re-use the
'clear' name from dicts?

disable() function:

Stop tracing Python memory allocations and clear traces of
memory blocks allocated by Python.

I would disable to stop tracing, but I would not expect it to clear
out the traces it had already captured. If it has to do that, please
put in some sample code showing how to save the current traces before
disabling.

-jJ
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Kristján Valur Jónsson

unread,

Oct 29, 2013, 6:00:28 AM10/29/13

to Python dev

A
>
>
> disable() function:
>
> Stop tracing Python memory allocations and clear traces of
> memory blocks allocated by Python.
>
> I would disable to stop tracing, but I would not expect it to clear out the
> traces it had already captured. If it has to do that, please put in some sample
> code showing how to save the current traces before disabling.

I was thinking something similar. It would be useful to be able to "pause" and "resume"
if one is doing any analysis work in the live environment. This would reduce the
need to have "Filter" objects.

K

Victor Stinner

unread,

Oct 29, 2013, 7:37:52 AM10/29/13

to Jim Jewett, Python dev

2013/10/29 Jim Jewett <jimjj...@gmail.com>:

> reset() function:
>
> Clear traces of memory blocks allocated by Python.
>
> Does this do anything besides clear? If not, why not just re-use the
> 'clear' name from dicts?

(I like the reset() name. Charles-François suggested this name
inspired by OProfile API.)

> disable() function:
>
> Stop tracing Python memory allocations and clear traces of
> memory blocks allocated by Python.
>
> I would disable to stop tracing, but I would not expect it to clear
> out the traces it had already captured. If it has to do that, please
> put in some sample code showing how to save the current traces before
> disabling.

For consistency, you cannot keep traces when tracing is disabled. The
free() must be enabled to remove allocated memory blocks, or next
malloc() may get the same address which would raise an assertion error
(you cannot have two memory blocks at the same address).

Just call get_traces() to get traces before clearing them. I can
explain it in the doc.

2013/10/29 Kristján Valur Jónsson <kris...@ccpgames.com>:

> I was thinking something similar. It would be useful to be able to "pause" and "resume"
> if one is doing any analysis work in the live environment. This would reduce the
> need to have "Filter" objects.

For the reason explained above, it's not possible to disable the whole
module temporarly.

Internally, tracemalloc uses a thread-local variable (called the
"reentrant" flag) to disable temporarly tracing allocations in the
current thread. It only disables tracing new allocations,
deallocations are still proceed.

Victor

Jim J. Jewett

unread,

Oct 29, 2013, 11:45:40 PM10/29/13

to Python dev

(Tue Oct 29 12:37:52 CET 2013) Victor Stinner wrote:

> For consistency, you cannot keep traces when tracing is disabled.
> The free() must be enabled to remove allocated memory blocks, or
> next malloc() may get the same address which would raise an assertion
> error (you cannot have two memory blocks at the same address).

That seems like an a quirk of the implementation, particularly since
the actual address is not returned to the user. Nor do I see any way
of knowing when that allocation is freed.

Well, unless I missed it... I don't see how to get anything beyond
the return value of get_traces, which is a (time-ordered?) list
of allocation size with then-current call stack. It doesn't mention
any attribute for indicating that some entries are de-allocations,
let alone the actual address of each allocation.

> For the reason explained above, it's not possible to disable the whole
> module temporarly.

> Internally, tracemalloc uses a thread-local variable (called the
> "reentrant" flag) to disable temporarly tracing allocations in the
> current thread. It only disables tracing new allocations,
> deallocations are still proceed.

Even assuming the restriction is needed, this just seems to mean that
disabling (or filtering) should not affect de-allocation events, for
fear of corrupting tracemalloc's internal structures.

In that case, I would expect disabling (and filtering) to stop
capturing new allocation events for me, but I would still expect
tracemalloc to do proper internal maintenance.

It would at least explain why you need both disable *and* reset;
reset would empty those internal structures, so that tracemalloc
could shortcut that maintenance. I would NOT assume that I needed
to call reset when changing the filters, nor would I assume that
changing them threw out existing traces.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them. -jJ

Stephen J. Turnbull

unread,

Oct 30, 2013, 12:09:48 AM10/30/13

to Victor Stinner, Jim Jewett, Python dev

Victor Stinner writes:

> 2013/10/29 Jim Jewett <jimjj...@gmail.com>:
> > reset() function:
> >
> > Clear traces of memory blocks allocated by Python.
> >
> > Does this do anything besides clear? If not, why not just re-use the
> > 'clear' name from dicts?
>
> (I like the reset() name. Charles-François suggested this name
> inspired by OProfile API.)

Just "reset" implies to me that you're ready to start over. Not just
traced memory blocks but accumulated statistics and any configuration
(such as Filters) would also be reset. Also tracing would be disabled
until started explicitly.

If you want it to apply just to the traces, reset_traces() would be
more appropriate.

> > disable() function:
> >
> > Stop tracing Python memory allocations and clear traces of
> > memory blocks allocated by Python.
> >
> > I would disable to stop tracing, but I would not expect it to clear
> > out the traces it had already captured. If it has to do that, please
> > put in some sample code showing how to save the current traces before
> > disabling.
>
> For consistency, you cannot keep traces when tracing is disabled. The
> free() must be enabled to remove allocated memory blocks, or next
> malloc() may get the same address which would raise an assertion error
> (you cannot have two memory blocks at the same address).

Then I would not call this "disable". disable() should not "destroy" data.

> Just call get_traces() to get traces before clearing them. I can
> explain it in the doc.

Shouldn't disable() do this automatically, perhaps with an optional
discard_traces flag (which would be False by default)?

But I definitely agree with Jim: You *must* provide an example here
showing how to save the traces (even though it's trivial to do so),
because that will make clear that disable() is a destructive
operation. (It is not destructive in any other debugging tool that
I've used.) Even with documentation, be prepared for user complaints.

Victor Stinner

unread,

Oct 30, 2013, 6:02:59 AM10/30/13

to Jim J. Jewett, Python dev

Hi,

2013/10/30 Jim J. Jewett <jimjj...@gmail.com>:

> Well, unless I missed it... I don't see how to get anything beyond
> the return value of get_traces, which is a (time-ordered?) list
> of allocation size with then-current call stack. It doesn't mention
> any attribute for indicating that some entries are de-allocations,
> let alone the actual address of each allocation.

get_traces() does return the traces of the currently allocated memory
blocks. It's not a log of alloc/dealloc calls. The list is not sorted.
If you want a sorted list, use take_snapshot.statistics('lineno') for
example.

> In that case, I would expect disabling (and filtering) to stop
> capturing new allocation events for me, but I would still expect
> tracemalloc to do proper internal maintenance.

tracemalloc has an important overhead in term of performances and
memory. The purpose of disable() is to... disable the module, to
remove complelty the overhead.

In practice, enable() installs on memory allocators, disable()
uninstalls these hooks.

I don't understand why you are so concerned by disable(). Why would
you like to keep traces and disable the module? I never called
disable() in my own tests, the module is automatically disabled at
exit.

Victor

Victor Stinner

unread,

Oct 30, 2013, 6:09:30 AM10/30/13

to Stephen J. Turnbull, Jim Jewett, Python dev

2013/10/30 Stephen J. Turnbull <ste...@xemacs.org>:

> Just "reset" implies to me that you're ready to start over. Not just
> traced memory blocks but accumulated statistics and any configuration
> (such as Filters) would also be reset. Also tracing would be disabled
> until started explicitly.

If the name is really the problem, I propose the restore the previous
name: clear_traces(). It's symmetric with get_traces(), like
add_filter()/get_filters()/clear_filters().

> Shouldn't disable() do this automatically, perhaps with an optional
> discard_traces flag (which would be False by default)?

The pattern is something like that:

enable()
snapshot1 = take_snapshot()
...
snapshot2 = take_snapshot()
disable()

I don't see why disable() would return data.

> But I definitely agree with Jim: You *must* provide an example here
> showing how to save the traces (even though it's trivial to do so),
> because that will make clear that disable() is a destructive
> operation. (It is not destructive in any other debugging tool that
> I've used.) Even with documentation, be prepared for user complaints.

I added "Call get_traces() or take_snapshot() function to get traces
before clearing them." to the doc:

http://www.haypocalc.com/tmp/tracemalloc/library/tracemalloc.html#tracemalloc.disable

Victor

Jim Jewett

unread,

Oct 30, 2013, 3:58:20 PM10/30/13

to Victor Stinner, Python dev

On Wed, Oct 30, 2013 at 6:02 AM, Victor Stinner
<victor....@gmail.com> wrote:
> 2013/10/30 Jim J. Jewett <jimjj...@gmail.com>:
>> Well, unless I missed it... I don't see how to get anything beyond
>> the return value of get_traces, which is a (time-ordered?) list
>> of allocation size with then-current call stack. It doesn't mention
>> any attribute for indicating that some entries are de-allocations,
>> let alone the actual address of each allocation.

> get_traces() does return the traces of the currently allocated memory
> blocks. It's not a log of alloc/dealloc calls. The list is not sorted.
> If you want a sorted list, use take_snapshot.statistics('lineno') for
> example.

Any list is sorted somehow; I had assumed that it was defaulting to
order-of-creation, though if you use a dict internally, that might not
be the case. If you return it as a list instead of a dict, but that list is
NOT in time-order, that is worth documenting

Also, am I misreading the documentation of get_traces() function?

Get traces of memory blocks allocated by Python.
Return a list of (size: int, traceback: tuple) tuples.
traceback is a tuple of (filename: str, lineno: int) tuples.

So it now sounds like you don't bother to emit de-allocation
events because you just remove the allocation from your
internal data structure.

In other words, you provide a snapshot, but not a history --
except that the snapshot isn't complete either, because it
only shows things that appeared after a certain event
(the most recent enablement).

I still don't see anything here(*) that requires even saving
the address, let alone preventing re-use.

(*) get_object_traceback(obj) might require a stored
address for efficiency, but the base functionality of
getting traces doesn't.

I still wouldn't worry about address re-use though,
because the address should not be re-used until
the object has been deleted -- and is no longer
available to be passed to get_object_traceback.
So the worst that can happen is that an object which
was not traced might return a bogus answer
instead of failing.

>> In that case, I would expect disabling (and filtering) to stop
>> capturing new allocation events for me, but I would still expect
>> tracemalloc to do proper internal maintenance.

> tracemalloc has an important overhead in term of performances and
> memory. The purpose of disable() is to... disable the module, to

> remove completely the overhead.
> ... Why would you like to keep traces and disable the module?

Because of that very overhead. I think my use typical use case would
be similar to Kristján Valur's, but I'll try to spell it out in more
detail here.

(1) Whoa -- memory hog! How can I fix this?

(2) I know -- track all allocations, with a traceback showing why they
were made. (At a minimum, I would like to be able to subclass your
tool to do this -- preferably without also keeping the full history in
memory.)

(3) Oh, maybe I should skip the ones that really are temporary and
get cleaned up. (You make this easy by handling the de-allocs,
though I'm not sure those events get exposed to anyone working at
the python level, as opposed to modifying and re-compiling.)

(4) hmm... still too big ... I should use filters. (But will changing those
filters while tracing is enabled mess up your current implementation?)

(5) Argh. What I really want is to know what gets allocated at times
like XXX.
I can do that if times-like-XXX only ever occur once per process. I *might* be
able to do it with filters. But I would rather do it by saying "trace on" and
"trace off". Maybe even with a context manager around the suspicious
places.

(6) Then, at the end of the run, I would say "give me the info about how much
was allocated when tracing was on." Some of that might be going away
again when tracing is off, but at least I know what is making the allocations
in the first place. And I know that they're sticking around "long enough".

Under your current proposal, step (5) turns into

set filters
trace on
...
get_traces
serialize to some other storage
trace off

and step (6) turns into
read in from that other storage I just made up on the fly, and do my own
summarizing, because my format is almost by definition non-standard.

This complication isn't intolerable, but neither is it what I expect
from python.
And it certainly isn't what I expect from a binary toggle like enable/disable.
(So yes, changing the name to clear_traces would help, because I would
still be disappointed, but at least I wouldn't be surprised.)

Also, if you do stick with the current limitations, then why even have
get_traces,
as opposed to just take_snapshot? Is there some difference between them,
except that a snapshot has some convenience methods and some simple
metadata?

Later, he wrote:
> I don't see why disable() would return data.

disable is indeed a bad name for something that returns data.

The only reason to return data from "disable" is that (currently)
you're throwing
the data away, so either you want the data now or you should have turned it
off earlier.

-jJ

Victor Stinner

unread,

Oct 30, 2013, 4:40:39 PM10/30/13

to Jim J. Jewett, Python dev

Le 30 oct. 2013 20:58, "Jim Jewett" <jimjj...@gmail.com> a écrit :
> hough if you use a dict internally, that might not
> be the case.

Tracemalloc uses a {address: trace} duct internally.

> If you return it as a list instead of a dict, but that list is
> NOT in time-order, that is worth documenting

Ok i will document it.

> Also, am I misreading the documentation of get_traces() function?
>
> Get traces of memory blocks allocated by Python.
> Return a list of (size: int, traceback: tuple) tuples.
> traceback is a tuple of (filename: str, lineno: int) tuples.
>
>
> So it now sounds like you don't bother to emit de-allocation
> events because you just remove the allocation from your
> internal data structure.

I don't understand your question. Tracemalloc does not store events but traces. When a memory block is deallocated, it us removed from the internal dict (and so from get_traces() list).

> I still don't see anything here(*) that requires even saving
> the address, let alone preventing re-use.

The address must be stored internally to maintain the internal dict. See the C code.

> (1) Whoa -- memory hog! How can I fix this?
>

> (2) I know -- track allocallocations, with a traceback showing why they

> were made. (At a minimum, I would like to be able to subclass your
> tool to do this -- preferably without also keeping the full history in
> memory.)

What do you mean by "full history" and "subclass your tool"?

> (3) Oh, maybe I should skip the ones that really are temporary and
> get cleaned up. (You make this easy by handling the de-allocs,
> though I'm not sure those events get exposed to anyone working at
> the python level, as opposed to modifying and re-compiling.)

If your temporary objects are destroyed before you call get_traces(), you will not see them in get_traces(). I don't understand.

> (4) hmm... still too big ... I should use filters. (But will changing those
> filters while tracing is enabled mess up your current implementation?)

If you call add_filter(), new traces() will be filtered. Not the old ones, as explained in the doc. What do you mean by "mess up"?

> (5) Argh. What I really want is to know what gets allocated at times
> like XXX.
> I can do that if times-like-XXX only ever occur once per process. I *might* be
> able to do it with filters. But I would rather do it by saying "trace on" and
> "trace off". Maybe even with a context manager around the suspicious
> places.

I don't understand "times like XXX", what is it?

To see what happened between two lines of code, you can compare two snapshots. No need to disable tracing.

> (6) Then, at the end of the run, I would say "give me the info about how much
> was allocated when tracing was on." Some of that might be going away
> again when tracing is off, but at least I know what is making the allocations
> in the first place. And I know that they're sticking around "long enough".

I think you musunderstood how tracemalloc works. You should compile it and play with it. In my opinion, you already have everything in tracemalloc for you scenario.

> Under your current proposal, step (5) turns into
>
> set filters
> trace on
> ...
> get_traces
> serialize to some other storage
> trace off

s1=take_snapshot()
...
s2=take_snapshot()
...
diff=s2.statistics("lines", compare_to=s1)

> why even have
> get_traces,
> as opposed to just take_snapshot? Is there some difference between them,
> except that a snapshot has some convenience methods and some simple
> metadata?

See the doc: Snapshot.traces is the result of get_traces().

get_traces() is here is you want to write your own tool without Snapshot.

Victor

Stephen J. Turnbull

unread,

Oct 31, 2013, 1:08:42 AM10/31/13

to Jim Jewett, Python dev

Jim Jewett writes:

> Later, he wrote:
> > I don't see why disable() would return data.
>
> disable is indeed a bad name for something that returns data.

Note that I never proposed that disable() *return* anything, only that
it *get* the trace. It could store it in some specified object, or a
file, rather than return it, for example. I deliberately left what it
does with the retrieved data unspecified. The important thing to me
is that it not be dropped on the floor by something named "disable".

Victor Stinner

unread,

Oct 31, 2013, 6:41:20 AM10/31/13

to Jim Jewett, Python dev

2013/10/29 Victor Stinner <victor....@gmail.com>:

> 2013/10/29 Kristján Valur Jónsson <kris...@ccpgames.com>:
>> I was thinking something similar. It would be useful to be able to "pause" and "resume"
>> if one is doing any analysis work in the live environment. This would reduce the
>> need to have "Filter" objects.
>

> Internally, tracemalloc uses a thread-local variable (called the
> "reentrant" flag) to disable temporarly tracing allocations in the
> current thread. It only disables tracing new allocations,
> deallocations are still proceed.

If I give access to this flag, it would be possible to disable
temporarily tracing in the current thread, but tracing would still be
enabled in other threads. Would it fit your requirement?

Example:
---------------
tracemalloc.enable()
# start your application
...
# spawn many threads
...
# oh no, I don't want to trace this ugly function
tracemalloc.disable_local()
ugly_function()
tracemalloc.enable_local()
...
snapshot = take_snapshot()
---------------

You can imagine a context manager based on these two functions:
---------------
with disable_tracing_temporarily_in_current_thread():
ugly_function()
---------------

I still don't understand why you would need to stop tracing
temporarily. When I use tracemalloc, I never disable it.

Victor Stinner

unread,

Oct 31, 2013, 8:20:48 AM10/31/13

to Jim Jewett, Python dev

2013/10/31 Victor Stinner <victor....@gmail.com>:

> If I give access to this flag, it would be possible to disable
> temporarily tracing in the current thread, but tracing would still be
> enabled in other threads. Would it fit your requirement?

It's probably not what you are looking for :-)

As I wrote in the PEP, the API of tracemalloc was inspired by the
faulthandler module. enable() / disable() makes sense in faulthandler
because faulthandler is passive: it only do something on a trigger
(synchonous signals like SIGFPE or SIGSEGV). I realized that
tracemalloc is different: as written in the documentation, enable()
*starts* tracing. After enable() has been called, tracemalloc becomes
active. So tracemalloc should use names start() / stop() rather than
enable() / disable().

I did another experiment. I replaced enable/disable/is_enabled with
start/stop/is_tracing, and added enable/disable/is_enabled functions
to disable temporarily tracing.

API:

- clear_traces(): clear traces
- start(): start tracing (the old "enable")
- stop(): stop tracing and clear traces (the old "disable")
- disable(): disable temporarily tracing
- enable(): reenable tracing
- is_tracing(): True if tracemalloc is tracing, False otherwise (the
old "is_enabled")
- is_enabled(): True if tracemalloc is enabled, False otherwise

All these functions are process-wide (affect all threads).

tracemalloc is only tracing new allocations if is_tracing() and
is_enabled() are True.

If is_tracing() is True and is_enabled() is False, deallocations still
remove traces (otherwise, the internal dictionary of traces would
become inconsistent).

Example:
---------------
tracemalloc.start()

# start your application
...

useful = UsefulObject()
huge = HugeObject()
...
snapshot1 = take_snapshot()
...
# oh no, I don't want to trace this ugly object, but please don't
trash old traces
tracemalloc.disable()
ugly = ugly_object()
...
# release memory of the huge object
huge = None
...
# restart tracing (ugly is still alive)
tracemalloc.enable()
...
snapshot2 = take_snapshot()
tracemalloc.stop()
---------------

snapshot1 contains traces of objects:
- useful
- huge

snapshot2 contains traces of objects:
- useful

huge is missing from snapshot2 even if the module was disabled. ugly
is missing from snapshot2 because tracing was disabled.

Does it look better? I don't see the usecase of disable() / enable()
yet, but it's cheap (it just add a flag).

Ethan Furman

unread,

Oct 31, 2013, 10:32:42 AM10/31/13

to pytho...@python.org

On 10/31/2013 05:20 AM, Victor Stinner wrote:
> I did another experiment. I replaced enable/disable/is_enabled with
> start/stop/is_tracing, and added enable/disable/is_enabled functions
> to disable temporarily tracing.
>
> API:
>
> - clear_traces(): clear traces
> - start(): start tracing (the old "enable")
> - stop(): stop tracing and clear traces (the old "disable")
> - disable(): disable temporarily tracing
> - enable(): reenable tracing
> - is_tracing(): True if tracemalloc is tracing, False otherwise (the
> old "is_enabled")
> - is_enabled(): True if tracemalloc is enabled, False otherwise

These names make more sense. However, `stop` is still misleading as it both stops and destroys data. An easy fix for
that is for stop to save the data somewhere so get_traces (or whatever) can still retrieve it.

If `stop` really must destroy the data, perhaps it should be called `close` instead; StringIO has a similar close method
that when called destroys any stored data, and get_value must be called first if that data is wanted.

--
~Ethan~

Reply all

Reply to author

Forward