My problem is that I have written a procedure that reads a very large
file into RAM, processes it, and writes the results to an output
file. The net RAM usage should be zero after the procedure returns.
I would like to call this procedure several times, but after the first
time my computer's RAM is almost depleted, and during the second
procedure call there's not enough RAM to read the second file into
memory. Tcl, instead of releasing un-needed memory to allow the file
read, crashes with a memory allocation error.
At first I thought it was a memory leak, but apparently not, according
to feedback I've gotten. I would submit a bug report, but is this
even considered a bug? Or am I just out of luck due to how Tcl
manages memory? Also, it's hard to attach a 250MB file to a bug
ticket. If this is not a bug, is there any way to force Tcl to purge
its "high-water-mark memory" so that multiple high-memory-demand
procedure calls can be made?
I'm using ActiveTcl 8.5.8 on Ubuntu Intrepid.
Hi,
Tcl doesn't release memory in variables automatically, as far as I
know, but I may be wrong here. However you can use unset and array
unset respectively to explictly clear the variable.
Ruediger
The feedback you've gotten in 2960042, from Donal, explicitly says
that it is not a memory leak IF running it twice doesn't extends the
memory claim, which is what Donal (and everybody else) observe(s)
normally.
Since your initial report didn't mention this extra, worrying piece of
evidence you've just described, Donal set the bug status to Invalid
+Pending, meaning you have two weeks to provide counter evidence
before it gets automatically closed (which is reversible too).
So instead of switching to another communication channel
(comp.lang.tcl), just deposit this evidence as a comment to the bug.
We'll take it seriously there.
-Alex
Well, you have to be very careful with what you complain about. In
particular, the script you supplied to describe the problem (in Bug
#2960042) did not leak when tried many times over (100k) with somewhat
smaller amounts of data. Even a single byte of leakage would have
showed, but memory usage was stable and constant. There's nothing in
the code that switches from one memory management model when moving
from a few hundred kB of string to a few hundred GB of string; we just
don't bother with that level of sophistication. :-)
Whatever is wrong, you've not exhibited it yet so the rest of us can't
help debug the real problem yet. At a guess, in your real code you're
doing something that causes the data to be kept around; that's not a
leak, that's just a program that processes a lot of data. :-)
If you anticipate lots of identical strings, you can use techniques
like manual splitting and interning to cut memory usage. The [split]
command only does that sort of memory consumption reduction when
splitting into characters, as that's the only case where it is a
predictable major win.
Donal.
By the way. On linux you will not see that memory consumption as
reported by ps will decrease after all allocated memory was freed.
See this little test.
The reason is, that on linux the memory allocator will increase the
heap for small amounts of memory. But the heap is never shrinked after
the memory was freed. Of course for large amounts of memory the
allocator allocates "real" memory and if that is freed it is given
back to the system.
#ifndef MALLOC_CNT
# define MALLOC_CNT (10 * 1000 * 1000)
#endif
#ifndef MALLOC_SIZE
# define MALLOC_SIZE 32
#endif
void* ptr[MALLOC_CNT];
int main (int argc, char** argv)
{
int i;
for (i=0; i<MALLOC_CNT;i++ ) {
ptr[i] = malloc( MALLOC_SIZE );
}
for (i=0; i<MALLOC_CNT;i++ ) {
free(ptr[i]);
}
while(1) {
sleep(1);
}
return 0;
}
I'm aware that my initial attempt at a bug report was inadequate. The
feedback made clear to me that my understanding of the issue is not
sufficient to frame the problem much better in another bug report.
Rather than waste more of the maintainers' time with additional
flailing in the bug database, I was hoping to get a bit of education
on the topic here.
>
> Since your initial report didn't mention this extra, worrying piece of
> evidence you've just described, Donal set the bug status to Invalid
> +Pending, meaning you have two weeks to provide counter evidence
> before it gets automatically closed (which is reversible too).
As I (inadequately) stated the issue there, Donal's action was
perfectly appropriate. Before I take another stab in the dark and
waste more of his time, I'm hoping to get a little additional insight
from charitable, knowledgeable souls here.
>
> So instead of switching to another communication channel
> (comp.lang.tcl), just deposit this evidence as a comment to the bug.
> We'll take it seriously there.
>
The evidence is difficult to condense and parametrize to bug report
dimensions, since manifestation of the behavior depends strongly on
the configuration of the individual computer. I will try however; but
it would make it a bit easier if I could get an answer to a general
question: if a Tcl procedure takes up almost all of a computer's
available RAM in variables within the procedure's scope, and after all
variables are explicitly unset and the procedure returns Tcl still
holds on to all the previously used RAM so that another invocation of
the procedure causes the interpreter to crash due to a memory
allocation error, would that be considered by the maintainers to be a
bug?
Yes, it means some hard work on your side, for example to tweak your
setup so that it still exhibits the bug with a [string repeat A
250000000] instead of a 250-meg file. But don't that work is wasted:
it is crucial for either your understanding or where the leak lies in
your app, or our understanding of a real case of Tcl-core leak.
> it would make it a bit easier if I could get an answer to a general
> question: if a Tcl procedure takes up almost all of a computer's
> available RAM in variables within the procedure's scope, and after all
> variables are explicitly unset and the procedure returns Tcl still
> holds on to all the previously used RAM so that another invocation of
> the procedure causes the interpreter to crash due to a memory
> allocation error, would that be considered by the maintainers to be a
> bug?
Definitely yes. The high-water mark method means that a repeated,
identical allocation should reuse the same resources without a single
byte of overhead. So again, when you manage to circumvent the need for
a large specific input file, we'll have no rest until Tcl's honor is
washed :)
-Alex
I don't want to derail your issue. Have you considered not reading the
whole file into memory at once? Is it a text file you're processing?
Can you read the file line-by-line, or by chunks?
--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous
If you're interested in trying a different solution post a description
of your problem and ask for suggestions.
tomk
you are right, but ... we had files generated from a system,
describing the movement of points in 3D using a definition file of a
kind of network of points describing a geometry.
The definition file was already around 20MB, but the movement file was
nearly 1GB big - both files in plain text.
And ... it was not really acceptable to load only step by step a
perhaps configurable amount of movement sets to be shown inside a 3D
simulation environment driven by a tcl kernel in a C++/OpenGL
environment.
Loading data for 2 Seconds of simulation with a normal display update
rate was already too much for keeping the system of "normal" users
with limited RAM sizes really usable.
But ... we never had crashing software, because of reading the same
data twice - not on Unix OS' nor on MS Windows - from tcl 8.0 on.
Best regards,
Martin Lemburg
The core issue for me is not to get a specific piece of code to work,
but to understand a certain behavior and clarify if it classes as a
bug. I've made just such optimizations in my original program. But
the core issue is that under certain circumstances Tcl seems never to
free memory used in data processing, even after all variables used
have been unset. Thus when processing large data sets over a long
time period, Tcl eventually starves the computer of usable RAM.
Therefore no matter how small the chunks I try to read a file in,
eventually those chunks will be bigger than the available memory, and
the interpreter will crash.
The question is, is this a bug? Is it a performance optimization used
too enthusiastically without due thought given to endgame? Is it
behavior forced on the interpreter process by the operating system?
Whatever it is, the upshot for me is that it is preventing me from
using Tcl for long-running programs that process large amounts of
data, tasks that Tcl otherwise is perfect for.
It's a bug. Whose bug, well, we don't know that yet. :-)
It's most certainly possible to write scripts to completely consume
all memory. However, I've had scripts that dealt with very large
datasets for a long time, so it's also possible to have things be OK.
The details really matter. (For example, while Tcl is reckoned to be
leak-free, Tk is *known* to have significant issues though mitigation
steps can be taken if you know what to look out for.)
Donal.
In some cases, the right thing can be to move to working with the data
in packed form with C code. It's good to try to avoid such a step
(it's very inflexible) but sometimes it's the only alternative. Critcl
makes it a much simpler step than it would otherwise be. :-)
Donal.
proc mem_test {} {
set str [string repeat 0 [expr {1000*1000}]] ;# 1 MB
}
mem_test
If I understand the issue, the only way to release memory is to use
rename mem_test {}
Am I right ?
My understanding is the only way to release the memory is [exit]
No. See bug 2960042 for an explanation of the two kinds of allocations
in Tcl and the consequences.
In the specific case you're mentioning it is a vanilla malloc, that is
freed as soon as the proc exits.
-Alex
http://www.ruby-forum.com/topic/137642
Bottom line, memory management/garbage collection is always hard, but
desire to do it right seems to get stronger as a tool gets put to more
use.
Heh. QOTW candidate ?
-Alex