I am sending this a bit earlier than I would normally as I am still trying to figure out the contribution of the Guile garbage collector on performance and am not quite done. I have also not yet measured a test over significant data for a before and after comparison, for example.But since Ben sent Linas an email about this problem, I'll send my work in progress, so Linas has a bit more information to answer Ben's question.So, the following represents a partial analysis of the SQL issues I found. I removed the code for fetch-atom and store-atom and while this is considerably faster, there still seems to be many-second pauses in processing that are not related to the number of words in the sentence in any way I can easily correlate.
-----------------------My 6,000 sentence test file finished over the weekend. It took about 48 hours.
For some context, the input corpus, Pride and Prejudice, has approximately:120 K words700 K bytes6 K sentences2 K paragraphs25 average words per sentenceSo it is a fair sized novel but only a tiny tiny fraction of what we'd like to feed the language learning system.In looking at the Postgres statistics for updates on a freshly initialized database, I found:
SELECT relname, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del FROM pg_stat_user_tables;
relname | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
------------+-----------+---------------+-----------+-----------+-----------
atoms | 95789119 | 90072278 | 3288742 | 0 | 0
values | 4 | 0 | 0 | 0 | 0
typecodes | 0 | 0 | 228 | 0 | 0
valuations | 131581719 | 84520666 | 44519326 | 0 | 41926158
spaces | 3288742 | 3288742 | 2 | 0 | 0
(5 rows)So roughly 90M fetches for Atoms and 85M for Valuations while parsing a single file of 6K sentences.
We have 3.3M atoms inserted and 90.0M index tuple fetches for the atom table.In digging into the code, i found we have an explicit delete and insert for every single increment of every atom count maintained by the system. The count increment code is also recursive for the trees defined by each pair which will in practice be many different atoms because even though we only increment the count for one atom in the tree, the store-atom and fetch-atom calls update the truth values for every atom in the tree that has a truth value.
The pair counting code has this comment:
; ---------------------------------------------------------------------
; update-pair-counts -- count occurances of word-pairs in a parse.
;
; Note that this might throw an exception...
;
; The structures that get created and incremented are of the form
;
; EvaluationLink
; PredicateNode "*-Sentence Word Pair-*"
; ListLink
; WordNode "lefty" -- or whatever words these are.
; WordNode "righty"
;
; ExecutionLink
; SchemaNode "*-Pair Distance-*"
; ListLink
; WordNode "lefty"
; WordNode "righty"
; NumberNode 3
So every pair has 11 atoms.
The truth value gets set only at the top level but the fetch and store end up retrieving and storing the valuations for the entire tree where there are truth values.
Further, the counting code is storing and and fetching the truth values for each unique count value each time it gets incremented. So we'll have (1: an insert for 1), then (2: a delete of the 1, and an insert for 2), then (3: a delete of the 2, and an insert for 3), (4: a delete for 3, and an insert for 4), etc.
And this is done for the EvaluationLink, ExecutionLink and four Word Nodes in the above example.I wanted to verify this assumption, so I created a fresh database and turned on statement logging in Postgres. It resulted in the attached log file for a test of the a four-word sentence:(observe-text "This is a test.")We got 5419 lines in the log file (not counting the 333 lines of startup creations of types etc.), each of which represents a single SQL statement. There are:813 BEGIN,813 COMMIT,2127 SELECT,720 DELETE, and946 INSERTstatements for a four word sentence. (See the attached text log for details)There are 133 new atoms created and 93 valuations after the call to observe-text is finished.
My 6,000 sentence test file finished over the weekend. It took about 48 hours.
That's insanely slow. My best guess is that you failed to tune postgres according to the instructions in the README.
So my intuition about the Guile GC was right about after all.
I came into the office this morning after leaving last night with a test of the 6,000 sentence file which took two days last week and started me on this path before working on AWS.For the new test last night, I reintroduced the (gc) call in observe-text but did not put back the fetch-atom and store-atom calls.I figured it would either finish quickly last night (if the SQL issue was the only major problem) or run into the same problem I encountered last time I ran it last week (if something else like the GC was causing the slowdown). Either way I would be able to separate the GC performance issues I suspected from the SQL performance issues I already noted.When I arrived, the cogserver was using 100% but was still running and processing sentence #590. I suspected the Guile GC, since the CPU is pegging one process at 100% on a 6CPU machine that usually has 1150% for the cogserver when running. This matches the profile of one thread blocking all the others, which seemed like the GC, I fired up gdb, attached to the cog server, and did a backtrace:gdb) bt
#0 0x00007f562938cfb2 in do_sigsuspend (set=0x7f56244ea2c0)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
#1 __GI___sigsuspend (set=0x7f56244ea2c0)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#2 0x00007f56242ce832 in GC_suspend_handler_inner ()
from /usr/lib/x86_64-linux-gnu/libgc.so.1
#3 0x00007f56242ce8a5 in GC_suspend_handler ()
from /usr/lib/x86_64-linux-gnu/libgc.so.1
#4 <signal handler called>
#5 0x00007f562941a93d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#6 0x00007f562944bbc4 in usleep (useconds=<optimized out>)
at ../sysdeps/unix/sysv/linux/usleep.c:32
#7 0x00007f562a2d7fb7 in opencog::CogServer::serverLoop (this=0x19727f0)
at /home/inflector/opencog/opencog/opencog/cogserver/server/CogServer.cc:211
#8 0x0000000000406b05 in main (argc=3, argv=0x7ffec4633ae8)
at /home/inflector/opencog/opencog/opencog/cogserver/server/CogServerMain.cc:232
(gdb)This is the code that the Garbage collector invokes to suspend threads while running. I've been digging into the code for the collector and found several potential interactions which will result in what I've been seeing. Essentially as the number of objects in the cogserver heap grows the Conservative GC has to search a larger and larger space until eventually this search takes a single core's entire processing capability. As the search space grows larger and larger, this blocking begins to dominate all execution.You can see this in my notes in the attached observe-text.log file where I show the GC time compared with total time and it is 320 seconds of 463 seconds for my 5 chapter test file. So 69% of the processing when processing only 303 sentences. And this percentage grows with successive processing of the same file. These successive runs create no new atoms but show a leak of 0.130G per execution with the (gc) call in observe-text; and 0.400G per processing of the 5 chapters without the (gc) call inside observe-text.Currently the job is sitting at 12 hours for one single sentence while the garbage collector is repeatedly running while attempting to complete it's job. Since I encountered this last week and just let it run over the weekend, and when I came back on Monday it was finished, I suspect that the GC is still churning away but being repeatedly invoked and at some point, it will finish enough to process the sentence. Further the completion will invoke another internal allocation which it always makes at 200% of the previous size, this will cause the GC to be invoked much less again for some period of time which will make the rest of the sentences process much faster. But at some size the GC stops working as the processing time required dominates a single CPU core.All the above means Guile's current GC won't work for AtomSpace. Not without some significant work. That's the bad news.The good news is that I believe we can implement a fix for this for our special case of Guile running in a separate process in the CogServer in a few days. My idea is a bit of a hack but will result in extremely fast allocations, zero leaks, and speedup the code by at least 2X over the small file performance case where we're getting sub second processing of lines. I'd not be surprised if we can hit 0.5 seconds in the no SQL saving case.I have attached a screenshot of the top where you can see 100% CPU for the conserver and Terminal windows where you can see that there were no errors starting the conserver and that it is on sentence 590 which took 1049 seconds to get to with an average per-sentence time at that point of 1.78 seconds. You can also see that there are no atoms in the atom space (except for the default one that gets added to every atomspace).So my plan is: as a first step, I'm going to implement a quick-and-dirty replacement for the GC which just uses malloc to see if that gets through the file. If this approach works, this will be confirmation of the above, but not be suitable for the long-term.Assuming the malloc-based allocations complete in reasonable time, I've got some ideas for altering the GC code to work in a separate per-thread heap for new allocations which we can then dispose en mass when the thread terminates. Assuming we can get a decent bound on usage of guile-internal objects during statement execution, this will be much much faster and won't leak. Everyone wins.Let me know if you have any questions or suggestions.- CurtisOn Thu, Jun 1, 2017 at 1:42 AM, Ben Goertzel <b...@goertzel.org> wrote:Thanks a lot Curtis!
I totally grok the desire to complete an analysis before announcing
results... OTOH if we were all sitting in the same office we'd
probably be communicating lots of interim results. So generally I'm
biased toward fostering communication! ...
See you Monday...
ben
On Wed, May 31, 2017 at 7:58 AM, Curtis Faith <curtis....@gmail.com> wrote:
> I should have been more explicit and said that removing the store and fetch
> was just a quick and dirty way of measuring the performance impact of this
> issue, not a suggested solution to it.
>
> On Wed, May 31, 2017 at 10:13 PM, Curtis Faith <curtis....@gmail.com>
> wrote:
>>
>> I am sending this a bit earlier than I would normally as I am still trying
>> to figure out the contribution of the Guile garbage collector on performance
>> and am not quite done. I have also not yet measured a test over significant
>> data for a before and after comparison, for example.
>>
>> But since Ben sent Linas an email about this problem, I'll send my work in
>> progress, so Linas has a bit more information to answer Ben's question.
>>
>> So, the following represents a partial analysis of the SQL issues I found.
>> I removed the code for fetch-atom and store-atom and while this is
>> considerably faster, there still seems to be many-second pauses in
>> processing that are not related to the number of words in the sentence in
>> any way I can easily correlate. Guile garbage collector? This might be due
>> to a memory leak which is resulting in the garbage collector having to
>> search over an ever increasing memory space looking for object-ish
>> references using it's conservative scheme.
>>
>> - Curtis
>>
>> -----------------------
>>
>> My 6,000 sentence test file finished over the weekend. It took about 48
>> hours.
>>
>> We have 3.3M atoms inserted and 90.0M index tuple fetches for the atom
>> table.
>>
>> In digging into the code, i found we have an explicit delete and insert
>> for every single increment of every atom count maintained by the system. The
>> count increment code is also recursive for the trees defined by each pair
>> which will in practice be many different atoms because even though we only
>> increment the count for one atom in the tree, the store-atom and fetch-atom
>> calls update the truth values for every atom in the tree that has a truth
>> value. The pair counting code has this comment:
>>
>> ; ---------------------------------------------------------------------
>> ; update-pair-counts -- count occurances of word-pairs in a parse.
>> ;
>> ; Note that this might throw an exception...
>> ;
>> ; The structures that get created and incremented are of the form
>> ;
>> ; EvaluationLink
>> ; PredicateNode "*-Sentence Word Pair-*"
>> ; ListLink
>> ; WordNode "lefty" -- or whatever words these are.
>> ; WordNode "righty"
>> ;
>> ; ExecutionLink
>> ; SchemaNode "*-Pair Distance-*"
>> ; ListLink
>> ; WordNode "lefty"
>> ; WordNode "righty"
>> ; NumberNode 3
>>
>>
>> So every pair has 11 atoms.
>>
>> The truth value gets set only at the top level but the fetch and store end
>> up retrieving and storing the valuations for the entire tree where there are
>> truth values.
>>
>> Further, the counting code is storing and and fetching the truth values
>> for each unique count value each time it gets incremented. So we'll have (1:
>> an insert for 1), then (2: a delete of the 1, and an insert for 2), then (3:
>> a delete of the 2, and an insert for 3), (4: a delete for 3, and an insert
>> for 4), etc. And this is done for the EvaluationLink, ExecutionLink and four
>> Word Nodes in the above example.
>>
>> I wanted to verify this assumption, so I created a fresh database and
>> turned on statement logging in Postgres. It resulted in the attached log
>> file for a test of the a four-word sentence:
>>
>> (observe-text "This is a test.")
>>
>> We got 5419 lines in the log file (not counting the 333 lines of startup
>> creations of types etc.), each of which represents a single SQL statement.
>> There are:
>>
>> 813 BEGIN,
>> 813 COMMIT,
>> 2127 SELECT,
>> 720 DELETE, and
>> 946 INSERT
>>
>> statements for a four word sentence. (See the attached text log for
>> details)
>>
>> There are 133 new atoms created and 93 valuations after the call to
>> observe-text is finished.
>>
>>
>
One further thing, with respect to the sql-fetch and store, one relatively simple change might be a new explicit call to save and fetch only a single atom's values, i.e. not recursively save atoms and all attached values. Something like: sql-store-values and sql-fetch-values and sql-store-one-atom and sql-fetch-one-atom, perhaps.
Then we could loop over newly created atoms at the end of the process and save them while saving only the final values for atoms that have already been persisted and who have updated values. This could be done in C++ as an AtomSpace call once at the end of the processing of a chunk of text so there would be some ability to restart long processes in the event of errors but the overhead would be greatly reduced while running large batches.When starting the CogServer, there could be a mark on the Atomspace that gets cleared when sql-load is called once, but this wouldn't have to be done more than once per initialisation of an instance of the CogServer. Subsequent SQL need only be writes batched together at the end as files get processed by observe-text or other similar steps in the pipeline.
Linas wrote:The idea that you are going to use a special pool for guile which you then clear out ever so often is just .. a proposal to take a sophisticated GC algorithm and replace it with a truly sophmoric ..a ahem freshman concept of GC. Its a waste of time.GC is needed when you have long-running processes or threads that can't just leak with impunity. It is wholly unnecessary for short-duration tasks with moderate memory requirements. In the special case of processing batch requests, with web servers like nginx or Apache or the CogServer running observe-text, there are not many objects that can't immediately be destroyed when the request (or sets of those requests) finish. That makes the overhead of GC unnecessary in these batch processing cylcles. It also makes the overhead of finalizing anything unnecessary if there is enough RAM to service the requests without any cleanup during a single requests processing. You need to release system resources and that's about it.A pool-based cleanup--once-at-the-end approach will make things much faster whether the problem I am seeing ends up being from a bug or not.
Performance is clearly an issue for unsupervised learning
and general AI in general. It is also an issue for OpenCog right now since getting the data into the AtomSpace in the right form is taking far too long right now.
No matter what we do performance will always be an issue because of the sheer size of the datasets researchers want to work with.
That is why Ben first asked me to look into using AWS to spawn parallel processes to cut down on the calendar time required to input large corpora.
>>>>> I totally
OK. So lets think this through. The only place where GC is being used is in guile; GC is not being used in the atomspace itself. So you could accomplish exactly the same thing by periodically shutting down guile, completely. This would release all that memory, and then you are done.
But I don't see any way of implementing a pool, without fully shutting down guile; and if one fully shuts down guile, then one doesn't need a pool.
The problem with this proposal is that pretty much everything runs through guile. All the atoms go in through it, and come out through it. So shutting down guile and restarting it is tantamount to shutting down the system, and restarting it. Which is OK, if you saved all the atoms you care about to the database.
1) I fixed the hang. :-) Or figured out how to avoid it. It turns out that, due to a stoopid bug/oversight, quotation marks in the input text were not being escaped. Thus, guile would see a begin-quote, end-quote, followed by garbage. A backtrace would be generated, and passed back to the witless user, who is just netcat, can couldn't give a damn, and so was silently ignored. Escape the quotation marks correctly, the problem goes away.
2) how are you measuring GC time? I also get 70% but i also get 500% cpu time for the other 181 active threads, so 70/500 seems acceptable to me. Again, GC halts only guile, it does not halt the atomspace.
1) I fixed the hang. :-) Or figured out how to avoid it. It turns out that, due to a stoopid bug/oversight, quotation marks in the input text were not being escaped. Thus, guile would see a begin-quote, end-quote, followed by garbage. A backtrace would be generated, and passed back to the witless user, who is just netcat, can couldn't give a damn, and so was silently ignored. Escape the quotation marks correctly, the problem goes away.
Linas wrote:OK. So lets think this through. The only place where GC is being used is in guile; GC is not being used in the atomspace itself. So you could accomplish exactly the same thing by periodically shutting down guile, completely. This would release all that memory, and then you are done.<snip>But I don't see any way of implementing a pool, without fully shutting down guile; and if one fully shuts down guile, then one doesn't need a pool.Okay. So here's what I think it happening now in the CogServer, please correct me if this is wrong:CogServer receives 3 requests, then CogServer creates three new threads:'scm (observe-text "Test sentence one. )"'Thread one'scm (observe-text "Test sentence two. )"'Thread two'scm (observe-text "Test sentence three. )"'Thread threeIf Guile's GC, the bwdgc, were to be altered to check a thread-local flag and if it was set it would allocated new objects out of a thread-local pool, then when these types of threads completed there would be no need to garbage collect any allocations made out of that pool. One could just restart the pool.
But this is way more work than just restarting Guile every once in a while, I agree. There may also be other problems with this approach. It is a very general solution to what is likely to be a very common issue in the future, so it may be worth looking at in the future.So on to your objection to restarting Guile.The problem with this proposal is that pretty much everything runs through guile. All the atoms go in through it, and come out through it. So shutting down guile and restarting it is tantamount to shutting down the system, and restarting it. Which is OK, if you saved all the atoms you care about to the database.I don't see why shutting down Guile is tantamount to shutting down the system and restarting it. Right now, the atom space is created first,
then the SchemeEval object is created in CogServer::CogServer.
So I don't see why we couldn't have a function that shutdown Guile releasing all it's memory back to the OS,
and then restarted a new Guile interpreter without destroying the AtomSpace, upon restart you'd pass in the same atom space that existed before the shutdown.Now, due to some of the quirks of the GC, and perhaps Guile's interaction with it, shutting down and restarting is a bit tricky. You've got the issue with the infinite sleeping initialization thread noted in https://github.com/opencog/atomspace/blob/master/opencog/guile/SchemeEval.cc#L241-L285You've also got a few scattered SCM static globals scattered about that would have to be cleared and reloaded. It's a bit of work, sure, but doable.
For the first stages of the language learning process, as I understand
it, what we really need to do is just
***
1. Split input into sentences (which can be done with lots of
sentence-splitters, including ours)
2. For each sentence S:
-- do some stemming (again that can be done with lots of stemmers,
including our own), so that each word is associated with a stem
-- identify each pair of words (W1, W2) so that W1 occurs before W2 in S
-- create and update some Atoms based on the pair (W1, W2)
***
The Atoms to be created are something like
-- an EvaluationLink indicating a link-parser AnyLink between W1 and W2
-- updating a couple counts based on Linas's recent code
I'm in too much of a hurry to look up the format of these links right
now but Ruiting probably remembers or I could look it up this
afternoon..
That would do it right?
The current approach replaces the "identify each pair of words" step
with "do a random link parse", but it is not clear that doing a random
link parse is actually better; my own feeling is that it probably
isn't, but we debated this extensively before without resolution....
To replicate the current pipeline better one would replace
-- identify each pair of words (W1, W2) so that W1 occurs before W2 in S
with
-- do a random link parse, then identify each pair of words (W1,W2)
that are linked in the random link parse
...
Am I missing something?
The above could all be done in C++ perfectly well; it doesn't require
Guile because it doesn't require any of the fancy stuff in the current
NLP pipeline...
Linas wrote:1) I fixed the hang. :-) Or figured out how to avoid it. It turns out that, due to a stoopid bug/oversight, quotation marks in the input text were not being escaped. Thus, guile would see a begin-quote, end-quote, followed by garbage. A backtrace would be generated, and passed back to the witless user, who is just netcat, can couldn't give a damn, and so was silently ignored. Escape the quotation marks correctly, the problem goes away.Cool.
Puzzling how this but might result in the fp in the garbage collector getting screwed up. Did you find the specific bug in the GC or Guile's use of it that causes the fp to get screwed up, so we can help the developers get that fixed?
A problem in scheme source syntax or variables not being spelled correctly shouldn't result in an infinite loop in the GC in any case. Seems like they must be missing some sort of exception handler somewhere.2) how are you measuring GC time? I also get 70% but i also get 500% cpu time for the other 181 active threads, so 70/500 seems acceptable to me. Again, GC halts only guile, it does not halt the atomspace.I was simply using the overall duration of the test as measured by the perl script and the result of (gc-run-time) as measured via a telnet session running in another bash terminal. My measurements may be in error if my assumption about the GC is incorrect as I've hinted a couple of times in prior emails. If anything, however, I'm over counting the total time and undercounting the percentage since I'm not getting CPU measurements for duration on the CogServer, I'm counting the time until the test is finished as determined by the perl script.
My base assumption is that 'gc-run-time' is time when the other Guile threads are blocked. I make this assumption because of the way that ' gc_time_taken' statistic in Guile is generated via hooks into the GC's before_gc and after_gc hooks that get called before and after each stop-the-world collection. So my assumption is that since the stop-the-world code suspends all the threads that can be garbage collected, and that since all the CogServer generated threads created in response to an "scm hush (observe-text 'Some sentence'))" are threads that are stopped since they allocate objects in the GC via Guile, that therefore the GC time reported is non-overlapping with the other processing time.If a test takes 100 seconds, and 50 seconds are spent in the GC, during those 50 seconds, there is no work going on in any non-GC threads because they are suspended during the duration of the stop-the-world collection. I have looked at the code to verify these assumptions but it is certainly possible that I am missing something.
How many CPUs are on your test machine?
Does it have hyper-threaded Intel chips?
If so, you don't tend to get accurate measures of real-time CPU performance on those chips.
I don't see why shutting down Guile is tantamount to shutting down the system and restarting it. Right now, the atom space is created first,
Ehh? Where? Right now, guile is started first, then the opencog module is loaded, and this module creates the atomspace. If you leave guile, you get the bash prompt; there is no atomspace any more; theres no running executable any more, after you leave guile.
then the SchemeEval object is created in CogServer::CogServer.
That's incorrect. The SchemeEval object is created when you (use-modules (oepncog exec)) I typically create this before running the cogserver.
There's no bug in the GC. I don't understand why you keep saying that, or what evidence you have for that.
How many CPUs are on your test machine?
24
Sounds like you're counting wall-clock time, not cpu-time. So that is misleading you.