Re: Language Learning - Progress and Performance

135 views
Skip to first unread message

Linas Vepstas

unread,
Jun 3, 2017, 12:00:49 PM6/3/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, opencog


On Wed, May 31, 2017 at 9:13 AM, Curtis Faith <curtis....@gmail.com> wrote:
I am sending this a bit earlier than I would normally as I am still trying to figure out the contribution of the Guile garbage collector on performance and am not quite done. I have also not yet measured a test over significant data for a before and after comparison, for example.

But since Ben sent Linas an email about this problem, I'll send my work in progress, so Linas has a bit more information to answer Ben's question.

So, the following represents a partial analysis of the SQL issues I found. I removed the code for fetch-atom and store-atom and while this is considerably faster, there still seems to be many-second pauses in processing that are not related to the number of words in the sentence in any way I can easily correlate.

The atospace SQL backend has 8 writeback queues, which. by default are allowed  to be quite large.  They normally drain in under a second, but if there are a lot of atomspace threads putting atoms into them, the drains can take tens of seconds, very rarely minutes or longer, e.g. if threads are dumping atoms in at the same rate that the writers are draining the queues.

Also: postgres on SSD's is about 10x faster than postgres on spining disks, for me.

-----------------------

My 6,000 sentence test file finished over the weekend. It took about 48 hours.
 
That's insanely slow.  My best guess is that you failed to tune postgres according to the instructions in the README.   Without this tuning, you will get at one or two disk writes for every atom, which, for spinning media, takes about disk rotation for each atom, i.e. 15 milliseconds a 7200 RPM disk drive. That works out to maybe 100 atoms/second.

for 6000 sentences x 16 parses/sentence x 20 words/parse x 3.5 atom writes/word = 6.7M atoms

at 100 atoms/sec, this would be 18 hours.  After postgres tuning, this should be, I dunno 3x or 5x faster??  I'm getting less than an hour for pride n prejudice, I believe. Not sure.

 

For some context, the input corpus, Pride and Prejudice, has approximately:

120 K words
700 K bytes
6 K sentences
2 K paragraphs
25 average words per sentence

So it is a fair sized novel but only a tiny tiny fraction of what we'd like to feed the language learning system. 

In looking at the Postgres statistics for updates on a freshly initialized database, I found:

SELECT relname, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del FROM pg_stat_user_tables;


  relname   | idx_scan  | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
------------+-----------+---------------+-----------+-----------+-----------
 atoms      |  95789119 |      90072278 |   3288742 |         0 |         0
 values     |         4 |             0 |         0 |         0 |         0
 typecodes  |         0 |             0 |       228 |         0 |         0
 valuations | 131581719 |      84520666 |  44519326 |         0 |  41926158
 spaces     |   3288742 |       3288742 |         2 |         0 |         0
(5 rows)

So roughly 90M fetches for Atoms and 85M for Valuations while parsing a single file of 6K sentences.

There are 16 parses per sentence. Besides observing each word once per parse, there are also observations of word-pairs, with the number of word-pairs being about equal to the number of words. (per parse)

My back-of-the-envelope above suggests that there should have been about 7M updates. You are seeing 10x as many.  I don't understand this at all.

Oh wait. I see the problem. You are using a version that also does "clique counting"  I had a goofy argument with Ben about this in HK, and later added clique counting. This increases the number of atom writes by 10x or 20x, and creates absolutely immense databases, so I later on disabled this code.  If you pull the latest and greatest, this is disabled.



We have 3.3M atoms inserted and 90.0M index tuple fetches for the atom table.

In digging into the code, i found we have an explicit delete and insert for every single increment of every atom count maintained by the system. The count increment code is also recursive for the trees defined by each pair which will in practice be many different atoms because even though we only increment the count for one atom in the tree, the store-atom and fetch-atom calls update the truth values for every atom in the tree that has a truth value.

That's not correct; or it shouldn't be; the fetch only fetches the values for the fetched atom, and not for the others. It cannot, must not update the others, as that would leave the atomspace contents garbled and incorrect.  That code is conservative, with the goal of not corrupting any truth values when fetching atoms.

 
The pair counting code has this comment:

; ---------------------------------------------------------------------
; update-pair-counts -- count occurances of word-pairs in a parse.
;
; Note that this might throw an exception...
;
; The structures that get created and incremented are of the form
;
;     EvaluationLink
;         PredicateNode "*-Sentence Word Pair-*"
;         ListLink
;             WordNode "lefty"  -- or whatever words these are.
;             WordNode "righty"
;
;     ExecutionLink
;         SchemaNode "*-Pair Distance-*"
;         ListLink
;             WordNode "lefty"
;             WordNode "righty"
;         NumberNode 3



So every pair has 11 atoms.

Yes, I disabled this later on, maybe a few weeks ago.
 

The truth value gets set only at the top level but the fetch and store end up retrieving and storing the valuations for the entire tree where there are truth values.

That should not be happening, at all. If it is, then its a bug.Can you point out where this is happening?
 

Further, the counting code is storing and and fetching the truth values for each unique count value each time it gets incremented. So we'll have (1: an insert for 1), then (2: a delete of the 1, and an insert for 2), then (3: a delete of the 2, and an insert for 3), (4: a delete for 3, and an insert for 4), etc.

If you want to change this to an sql UPDATE, that's fine.  UPDATES add a fair amount of code complexity. Perhaps the UPDATE will run a little faster than the DELETE/INSERT, but I'm not convinced that's a bottleneck.

To summarize:
1) configure postgress as described in the README
2) get the newer relex, which stubs out the stupid clique-counting idea.

--linas

p.s. you can configure postgres to not commit to disk, by saying fsync=off  This gives a 10x performance improvement, or more, but if postgres crashes, you will have a corrupted database that's not recoverable. This happened to me last month, when I accidentally ran out of RAM (while clique-counting) so I am now older, wiser, and $500 poorer after buying some SSD disks.
 
 
And this is done for the EvaluationLink, ExecutionLink and four Word Nodes in the above example.

I wanted to verify this assumption, so I created a fresh database and turned on statement logging in Postgres. It resulted in the attached log file for a test of the a four-word sentence:

(observe-text "This is a test.")

We got 5419 lines in the log file (not counting the 333 lines of startup creations of types etc.), each of which represents a single SQL statement. There are:

813 BEGIN,
813 COMMIT,
2127 SELECT,
720 DELETE, and
946 INSERT

statements for a four word sentence. (See the attached text log for details)

There are 133 new atoms created and 93 valuations after the call to observe-text is finished.



Ben Goertzel

unread,
Jun 3, 2017, 12:07:03 PM6/3/17
to Linas Vepstas, Curtis Faith, Ruiting Lian, opencog
Thanks Linas... Ruiting was getting stuck on this stuff so I asked
Curtis for help, but he's been stuck too...

Curtis, I guess you should try your best to follow Linas's recent
instructions before trying anything more radical...

thanks all !
--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

Curtis Faith

unread,
Jun 3, 2017, 12:11:13 PM6/3/17
to Linas Vepstas, Ben Goertzel, Ruiting Lian, opencog
My 6,000 sentence test file finished over the weekend. It took about 48 hours.
 
That's insanely slow.  My best guess is that you failed to tune postgres according to the instructions in the README. 

I tuned Postgres with fsync turned off etc. for the first test, and actually turned it off in the last iteration of my testing and still ran into the same problem. Absolutely no SQL calls whatsoever and it still took more than two days (at which point I shutdown the test).

My email of two days ago (June 2) in this thread that begins with "So my intuition about the Guile GC was right about after all." shows what happened when I removed the calls to sql-fetch and sql-store altogether.


On Sun, Jun 4, 2017 at 12:00 AM, Linas Vepstas <linasv...@gmail.com> wrote:

Linas Vepstas

unread,
Jun 3, 2017, 12:13:39 PM6/3/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
Which version of guile are you using? You should be using guile-2.2.  sooner or later, you also need to set up bdwgc with the ./configure --enable-large-config or you will get th dreaded `Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS` error.

Re guile inf loop. Yes, there is a bug in guile, I plan to open a bug report "real soon now".  I think you are describing this bug below. What I see is that:
* the guile prompt stops responding
* cpu spins at exactly 100% cpu usage.
* GDB shows a very small, very tight spin loop, when I single-step : get this:

single-stepping:

scm_i_vm_mark_stack (vp=0xbdca3120, mark_stack_ptr=0x7f361fe85ef0,
    mark_stack_limit=0x7f361fe95eb0) at ../../libguile/vm.c:975
975            fp = SCM_FRAME_DYNAMIC_LINK (fp))
(gdb) print fp
$1 = (union scm_vm_stack_element *) 0x7f36183d5e38
(gdb) step
973       for (fp = vp->fp, sp = vp->sp;
(gdb)
979           for (slot = nlocals - 1; sp < fp; sp++, slot--)
(gdb)
1011          slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp),
&cache);
(gdb)
1006          sp = SCM_FRAME_PREVIOUS_SP (fp);
(gdb) print fp
$2 = (union scm_vm_stack_element *) 0x7f36183d5e38
(gdb) print sp
$3 = (union scm_vm_stack_element *) 0x7f36183d5e48
(gdb) step
1011          slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp),
&cache);
(gdb)
find_slot_map (cache=0x7f361fe85ab0, ip=0x1) at ../../libguile/vm.c:932
932       size_t slot = (((scm_t_uintptr) ip) >> 2) %
SLOT_MAP_CACHE_SIZE;
(gdb)
935       if (cache->entries[slot].ip == ip)
(gdb)
scm_i_vm_mark_stack (vp=0xbdca3120, mark_stack_ptr=0x7f361fe85ef0,
    mark_stack_limit=0x7f361fe95eb0) at ../../libguile/vm.c:1011
1011          slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp),
&cache);
(gdb)
find_slot_map (cache=0x7f361fe85ab0, ip=0x1) at ../../libguile/vm.c:936
936         map = cache->entries[slot].map;
(gdb)
scm_i_vm_mark_stack (vp=0xbdca3120, mark_stack_ptr=0x7f361fe85ef0,
    mark_stack_limit=0x7f361fe95eb0) at ../../libguile/vm.c:975
975            fp = SCM_FRAME_DYNAMIC_LINK (fp))
(gdb) print fp
$4 = (union scm_vm_stack_element *) 0x7f36183d5e38

This also happens in older version e.g. guile-2.1.8  Its a real bug, I have not had time to tell the guile people about it.

weird thing is, everything was fine until about a 2-3 weeks ago, when I started seeing this bug. I was churning through huge quantities of text before this, and it was not an issue. At all. And now it is. I can't figure out why .. as I say, it still hangs with older versions, and I can't figure out what triggered this.

--linas

On Thu, Jun 1, 2017 at 11:09 PM, Curtis Faith <curtis....@gmail.com> wrote:
So my intuition about the Guile GC was right about after all. 

I came into the office this morning after leaving last night with a test of the 6,000 sentence file which took two days last week and started me on this path before working on AWS. 

For the new test last night, I reintroduced the (gc) call in observe-text but did not put back the fetch-atom and store-atom calls.

I figured it would either finish quickly last night (if the SQL issue was the only major problem) or run into the same problem I encountered last time I ran it last week (if something else like the GC was causing the slowdown). Either way I would be able to separate the GC performance issues I suspected from the SQL performance issues I already noted.

When I arrived, the cogserver was using 100% but was still running and processing sentence #590. I suspected the Guile GC, since the CPU is pegging one process at 100% on a 6CPU machine that usually has 1150% for the cogserver when running. This matches the profile of one thread blocking all the others, which seemed like the GC, I fired up gdb, attached to the cog server, and did a backtrace:

gdb) bt
#0  0x00007f562938cfb2 in do_sigsuspend (set=0x7f56244ea2c0)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
#1  __GI___sigsuspend (set=0x7f56244ea2c0)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#2  0x00007f56242ce832 in GC_suspend_handler_inner ()
   from /usr/lib/x86_64-linux-gnu/libgc.so.1
#3  0x00007f56242ce8a5 in GC_suspend_handler ()
   from /usr/lib/x86_64-linux-gnu/libgc.so.1
#4  <signal handler called>
#5  0x00007f562941a93d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#6  0x00007f562944bbc4 in usleep (useconds=<optimized out>)
    at ../sysdeps/unix/sysv/linux/usleep.c:32
#7  0x00007f562a2d7fb7 in opencog::CogServer::serverLoop (this=0x19727f0)
    at /home/inflector/opencog/opencog/opencog/cogserver/server/CogServer.cc:211
#8  0x0000000000406b05 in main (argc=3, argv=0x7ffec4633ae8)
    at /home/inflector/opencog/opencog/opencog/cogserver/server/CogServerMain.cc:232
(gdb)


This is the code that the Garbage collector invokes to suspend threads while running. I've been digging into the code for the collector and found several potential interactions which will result in what I've been seeing. Essentially as the number of objects in the cogserver heap grows the Conservative GC has to search a larger and larger space until eventually this search takes a single core's entire processing capability. As the search space grows larger and larger, this blocking begins to dominate all execution. 

You can see this in my notes in the attached observe-text.log file where I show the GC time compared with total time and it is 320 seconds of 463 seconds for my 5 chapter test file. So 69% of the processing when processing only 303 sentences. And this percentage grows with successive processing of the same file. These successive runs create no new atoms but show a leak of 0.130G per execution with the (gc) call in observe-text; and 0.400G per processing of the 5 chapters without the (gc) call inside observe-text.

Currently the job is sitting at 12 hours for one single sentence while the garbage collector is repeatedly running while attempting to complete it's job. Since I encountered this last week and just let it run over the weekend, and when I came back on Monday it was finished, I suspect that the GC is still churning away but being repeatedly invoked and at some point, it will finish enough to process the sentence. Further the completion will invoke another internal allocation which it always makes at 200% of the previous size, this will cause the GC to be invoked much less again for some period of time which will make the rest of the sentences process much faster. But at some size the GC stops working as the processing time required dominates a single CPU core.

All the above means Guile's current GC won't work for AtomSpace. Not without some significant work. That's the bad news. 

The good news is that I believe we can implement a fix for this for our special case of Guile running in a separate process in the CogServer in a few days. My idea is a bit of a hack but will result in extremely fast allocations, zero leaks, and speedup the code by at least 2X over the small file performance case where we're getting sub second processing of lines. I'd not be surprised if we can hit 0.5 seconds in the no SQL saving case. 

I have attached a screenshot of the top where you can see 100% CPU for the conserver and Terminal windows where you can see that there were no errors starting the conserver and that it is on sentence 590 which took 1049 seconds to get to with an average per-sentence time at that point of 1.78 seconds. You can also see that there are no atoms in the atom space (except for the default one that gets added to every atomspace).

So my plan is: as a first step, I'm going to implement a quick-and-dirty replacement for the GC which just uses malloc to see if that gets through the file. If this approach works, this will be confirmation of the above, but not be suitable for the long-term.

Assuming the malloc-based allocations complete in reasonable time, I've got some ideas for altering the GC code to work in a separate per-thread heap for new allocations which we can then dispose en mass when the thread terminates. Assuming we can get a decent bound on usage of guile-internal objects during statement execution, this will be much much faster and won't leak. Everyone wins.

Let me know if you have any questions or suggestions.

- Curtis

On Thu, Jun 1, 2017 at 1:42 AM, Ben Goertzel <b...@goertzel.org> wrote:
Thanks a lot Curtis!

I totally grok the desire to complete an analysis before announcing
results... OTOH if we were all sitting in the same office we'd
probably be communicating lots of interim results.  So generally I'm
biased toward fostering communication! ...

See you Monday...

ben

On Wed, May 31, 2017 at 7:58 AM, Curtis Faith <curtis....@gmail.com> wrote:
> I should have been more explicit and said that removing the store and fetch
> was just a quick and dirty way of measuring the performance impact of this
> issue, not a suggested solution to it.
>
> On Wed, May 31, 2017 at 10:13 PM, Curtis Faith <curtis....@gmail.com>

> wrote:
>>
>> I am sending this a bit earlier than I would normally as I am still trying
>> to figure out the contribution of the Guile garbage collector on performance
>> and am not quite done. I have also not yet measured a test over significant
>> data for a before and after comparison, for example.
>>
>> But since Ben sent Linas an email about this problem, I'll send my work in
>> progress, so Linas has a bit more information to answer Ben's question.
>>
>> So, the following represents a partial analysis of the SQL issues I found.
>> I removed the code for fetch-atom and store-atom and while this is
>> considerably faster, there still seems to be many-second pauses in
>> processing that are not related to the number of words in the sentence in
>> any way I can easily correlate. Guile garbage collector? This might be due
>> to a memory leak which is resulting in the garbage collector having to
>> search over an ever increasing memory space looking for object-ish
>> references using it's conservative scheme.
>>
>> - Curtis

>>
>> -----------------------
>>
>> My 6,000 sentence test file finished over the weekend. It took about 48
>> hours.
>>
>> We have 3.3M atoms inserted and 90.0M index tuple fetches for the atom
>> table.
>>
>> In digging into the code, i found we have an explicit delete and insert
>> for every single increment of every atom count maintained by the system. The
>> count increment code is also recursive for the trees defined by each pair
>> which will in practice be many different atoms because even though we only
>> increment the count for one atom in the tree, the store-atom and fetch-atom
>> calls update the truth values for every atom in the tree that has a truth
>> value. The pair counting code has this comment:

>>
>> ; ---------------------------------------------------------------------
>> ; update-pair-counts -- count occurances of word-pairs in a parse.
>> ;
>> ; Note that this might throw an exception...
>> ;
>> ; The structures that get created and incremented are of the form
>> ;
>> ;     EvaluationLink
>> ;         PredicateNode "*-Sentence Word Pair-*"
>> ;         ListLink
>> ;             WordNode "lefty"  -- or whatever words these are.
>> ;             WordNode "righty"
>> ;
>> ;     ExecutionLink
>> ;         SchemaNode "*-Pair Distance-*"
>> ;         ListLink
>> ;             WordNode "lefty"
>> ;             WordNode "righty"
>> ;         NumberNode 3
>>
>>
>> So every pair has 11 atoms.
>>
>> The truth value gets set only at the top level but the fetch and store end
>> up retrieving and storing the valuations for the entire tree where there are
>> truth values.
>>
>> Further, the counting code is storing and and fetching the truth values
>> for each unique count value each time it gets incremented. So we'll have (1:
>> an insert for 1), then (2: a delete of the 1, and an insert for 2), then (3:
>> a delete of the 2, and an insert for 3), (4: a delete for 3, and an insert
>> for 4), etc. And this is done for the EvaluationLink, ExecutionLink and four

>> Word Nodes in the above example.
>>
>> I wanted to verify this assumption, so I created a fresh database and
>> turned on statement logging in Postgres. It resulted in the attached log
>> file for a test of the a four-word sentence:
>>
>> (observe-text "This is a test.")
>>
>> We got 5419 lines in the log file (not counting the 333 lines of startup
>> creations of types etc.), each of which represents a single SQL statement.
>> There are:
>>
>> 813 BEGIN,
>> 813 COMMIT,
>> 2127 SELECT,
>> 720 DELETE, and
>> 946 INSERT
>>
>> statements for a four word sentence. (See the attached text log for
>> details)
>>
>> There are 133 new atoms created and 93 valuations after the call to
>> observe-text is finished.
>>
>>
>



Linas Vepstas

unread,
Jun 3, 2017, 12:19:40 PM6/3/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog


On Thu, Jun 1, 2017 at 11:27 PM, Curtis Faith <curtis....@gmail.com> wrote:
One further thing, with respect to the sql-fetch and store, one relatively simple change might be a new explicit call to save and fetch only a single atom's values, i.e. not recursively save atoms and all attached values. Something like: sql-store-values and sql-fetch-values and sql-store-one-atom and sql-fetch-one-atom, perhaps.

I think you are mis-reading the code, because that is not what it does.  So:

1) the first time ever, it MUST fetch/store the entire atom, because you can't put it in the atomspace, otherwise, or, in the other direction, you don't know what the UUID's are, and thus cannot access it.

2) after this "first time ever" step, only the values are updated. They are only updated for the current atom, and not for any other.  If other atoms were updated, this would cause existing truth values in the atomspace to be corrupted.  That would be very wrong, since atomspace users should have the right to assume that no one is messing with the values attached to atoms.   Users have the right to have total control over values, and so the SQL backend is careful to NOT update the values!  If it is, that's a bug.

--linas

Then we could loop over newly created atoms at the end of the process and save them while saving only the final values for atoms that have already been persisted and who have updated values. This could be done in C++ as an AtomSpace call once at the end of the processing of a chunk of text so there would be some ability to restart long processes in the event of errors but the overhead would be greatly reduced while running large batches.

When starting the CogServer, there could be a mark on the Atomspace that gets cleared when sql-load is called once, but this wouldn't have to be done more than once per initialisation of an instance of the CogServer. Subsequent SQL need only be writes batched together at the end as files get processed by observe-text or other similar steps in the pipeline.

Curtis Faith

unread,
Jun 3, 2017, 12:43:04 PM6/3/17
to Linas Vepstas, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
No, I'm using Guile 2.2.2. Indeed I did get that hanging bug earlier because the install was originally into /usr/local/lib/ instead of /usr/lib/ and I reported an issue on it two weeks ago. All my tests were done with Guile 2.2.2. after I moved the library into /usr/lib/ after I got the configuration sorted out.

The hang I found wasn't an infinite loop. It actually finished. The CogServer was spending all it's time in the Guile GC, and then when I came back after letting it run all weekend on Monday it was finished.

Linas Vepstas

unread,
Jun 3, 2017, 12:58:00 PM6/3/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
Hmm.

In my case, my "infinite loop" did not clear up overnight, while I slept, so I assumed it was infinite. 

To those who do threaded programming, there is a situation where there can be a very small window to unlock a stuck loop.  So, for example, something might be stuck in a loop, spinning, lock held 99.99% of the time, but there is a slim chance that some other thread sneaks in during the 0.01% window, fixes whatever the condition is, then the loop gets unstuck, and the program eventually finishes without errors.

It's possible that that is the case here -- i.e. it eventually finishes, if you wait long enough.  Its still a bug. Seems like this is moving to the top of my priority list, now.

But first, I have to plow through my emails.

--linas

Linas Vepstas

unread,
Jun 5, 2017, 3:10:46 PM6/5/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
Long message...

On Sat, Jun 3, 2017 at 10:07 PM, Curtis Faith <curtis....@gmail.com> wrote:
Linas wrote:
 
The idea that you are going to use a special pool for guile which you then clear out ever so often is just .. a proposal to take a sophisticated GC algorithm and replace it with a truly sophmoric ..a ahem freshman concept of GC. Its a waste of time.

GC is needed when you have long-running processes or threads that can't just leak with impunity. It is wholly unnecessary for short-duration tasks with moderate memory requirements. In the special case of processing batch requests, with web servers like nginx or Apache or the CogServer running observe-text, there are not many objects that can't immediately be destroyed when the request (or sets of those requests) finish. That makes the overhead of GC unnecessary in these batch processing cylcles. It also makes the overhead of finalizing anything unnecessary if there is enough RAM to service the requests without any cleanup during a single requests processing. You need to release system resources and that's about it.

A pool-based cleanup--once-at-the-end approach will make things much faster whether the problem I am seeing ends up being from a bug or not. 

OK. So lets think this through. The only place where GC is being used is in guile; GC is not being used in the atomspace itself. So you could accomplish exactly the same thing by periodically shutting down guile, completely. This would release all that memory, and then you are done.

The problem with this proposal is that pretty much everything runs through guile.  All the atoms go in through it, and come out through it.  So shutting down guile and restarting it is tantamount to shutting down the system, and restarting it.  Which is OK, if you saved all the atoms you care about to the database.

But I don't see any way of implementing a pool, without fully shutting down guile; and if one fully shuts down guile, then one doesn't need a pool.

The only alternatives are to use python, by python is single-threaded, so this is a non-starter. The 3rd alternative is haskell, but it's also garbage-collelcted.  Can't use C++ because it has no interpreted command-line. (Using C++ is tantamount to shutting everything down, recompiling, and restarting everything, which is clearly the worst-possible scenario).  A 5th alternative is to invent a custom vocabulary of words to control C++ objects from an interactive command line, but this is clearly a design disaster. It was 1988, when the folly of this was realized, which cause tcl to be integrated into C apps The inadequacy of TCL lead to the invention of guile ... and so here we are, full-circle.  We could switch to swig+perl, or to javascript.. but javascript is garbage-collected, and I think perl is too, not sure. I just don't see any way of implementing what you are talking about.

Performance is clearly an issue for unsupervised learning
 
Really? Sorry, but in what way? What's the  problem?
 
and general AI in general. It is also an issue for OpenCog right now since getting the data into the AtomSpace in the right form is taking far too long right now.

Really? In what way? What is the problem?
 
No matter what we do performance will always be an issue because of the sheer size of the datasets researchers want to work with.
 
Well, if those folks at Intel and AMD weren't so lazy, we'd have great performance by now.
 
That is why Ben first asked me to look into using AWS to spawn parallel processes to cut down on the calendar time required to input large corpora.

Well, we know Ben is crazy. This is not where the problem lies.  Its easy to get large corpora pumped through. I can give you a dozen dumps of datasets so large they won't fit in the RAM of your computer. Do you want large datasets? Cause I got them.

The problem is that I don't have tools to analyze those datasets. That's where 95% of my personal bottleneck lies.  Simply crunching a lot of data is just so totally not at all the hard part.
 
> I'm seeing 50% to 70% times spend in the GC.

Are you using the tool I sent you? because I am seeing less than 20%.

 
>>>>> I totally

Sorry for some of the sarcasm. Sure, more performance would always be nice, but GC time is a complete red herring.  Also, technically, I think GC is not a solvable problem. The alternative is reference counting, and that is also a total CPU hog.

I do have some proposals, but first: 1) I have large datasets, 2) creating large datasets is totally not an issue. 3) creating tools to analyze them is almost 100% of the issue.

But if you wanted to get atoms into the atomspace faster: like 10x faster or 20x faster: you could run the link-grammar parser in the same address space as the atomspace.  Just take what it spits out, convert them  into atoms, shove the atoms into the atomspace. This would completely by-pass guile, and bypass all GC.  So GC would totally not be an issue in this case. 

To be clear: currently, LG parses text, then bloody java code turns it into strings, which are sent over a socket to guile, which evaluates the strings, and creates atoms. About 80% or more of this process is the cost of having guile evaluate strings that specify atoms, in string format.  Eliminate this, and you get an instant 3x, 4x speedup.

Once you did this, you'd discover two other bottlenecks: shoving atoms into the atomspace is slowwwww. And pushing atoms out to the database is slowwww.  These are much harder, but more important bottlenecks to overcome.

Re: running LG in the same adress space as the atomspace: this has already been done; the surreal code does this. In a day or 2 or 3 you could write the needed wrapper code to have LG live directly inside of opencog, generating the correct atoms, thus totally bypassing  guile and garbage collection.  And this would be a very easy way to get a 3x speedup, if that's really your end-goal.  Its a lot wasier than all the other crazy schemes discussed.

In the very-long term, I plan to do this anyway, because I want to apply the LG algorthms to generic atomspace data, not just to natural language.  However, curently LG is totally focused only on lanugage, and its too much work to re-implement it as a generic data parser.  Baby steps, for now.

--linas


Ben Goertzel

unread,
Jun 5, 2017, 11:06:30 PM6/5/17
to Linas Vepstas, Curtis Faith, Ruiting Lian, Amen Belayneh, opencog
On Tue, Jun 6, 2017 at 3:10 AM, Linas Vepstas <linasv...@gmail.com> wrote:
>
> Well, we know Ben is crazy. This is not where the problem lies. Its easy to
> get large corpora pumped through. I can give you a dozen dumps of datasets
> so large they won't fit in the RAM of your computer. Do you want large
> datasets? Cause I got them.
>
> The problem is that I don't have tools to analyze those datasets. That's
> where 95% of my personal bottleneck lies. Simply crunching a lot of data is
> just so totally not at all the hard part.


Well, crazy or not, I have done a lot of statistical computational
linguistics in my life... and I have generally found that there is
NEVER enough data... and the amounts of data processed in the course
of the work reported in most published papers on unsupervised grammar
learning, is much larger than what we're working with now

Of course scaling up to deal with more text is not the "hard part" ..
and of course data-file-munging takes up 75% of your time when doing
computational linguistics research work ... but nevertheless my strong
suspicion is that to solve unsupervised language learning we're gonna
need a bunch of OpenCogs operating in parallel parsing a lot of text
...

Exactly what the text volume requirements are for our specific
algorithms, we obviously don't know yet, so I'm going on intuition
here, but so are you...

Ruiting and I are in Shanghai for Wed-Fri, and then she'll be back on
the language learning task on Monday ... hopefully by then the system
will be stably runnable by her and Curtis here in HK, so that next
week we can start exporting some feature vectors and playing with
clustering-ish algorithms etc. (in parallel with your own
experimentation with different clustering-ish algorithms).... Our
hope had been to start experimenting with clustering-ish algorithms on
the output of your MST parsing a couple weeks ago, but obviously these
bugs were making the system too slow for us to use for this
purpose.... Awesome if the bugs causing the objectionable slowness
have been fixed ;) thanks!!

ben

Ben Goertzel

unread,
Jun 5, 2017, 11:07:39 PM6/5/17
to Linas Vepstas, Curtis Faith, Ruiting Lian, Amen Belayneh, opencog
On Tue, Jun 6, 2017 at 3:10 AM, Linas Vepstas <linasv...@gmail.com> wrote:
> Re: running LG in the same adress space as the atomspace: this has already
> been done; the surreal code does this. In a day or 2 or 3 you could write
> the needed wrapper code to have LG live directly inside of opencog,
> generating the correct atoms, thus totally bypassing guile and garbage
> collection. And this would be a very easy way to get a 3x speedup, if
> that's really your end-goal. Its a lot wasier than all the other crazy
> schemes discussed.


yeah, we were discussing this yesterday... I think we may do something
like this... we will discuss again this afternoon...

Curtis Faith

unread,
Jun 5, 2017, 11:22:57 PM6/5/17
to Linas Vepstas, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
Linas wrote:
 
OK. So lets think this through. The only place where GC is being used is in guile; GC is not being used in the atomspace itself. So you could accomplish exactly the same thing by periodically shutting down guile, completely. This would release all that memory, and then you are done.
 
 <snip>
 
But I don't see any way of implementing a pool, without fully shutting down guile; and if one fully shuts down guile, then one doesn't need a pool.

Okay. So here's what I think it happening now in the CogServer, please correct me if this is wrong:

CogServer receives 3 requests, then CogServer creates three new threads:

    'scm (observe-text "Test sentence one. )"'
        Thread one

    'scm (observe-text "Test sentence two. )"'
        Thread two

    'scm (observe-text "Test sentence three. )"'
        Thread three

If Guile's GC, the bwdgc, were to be altered to check a thread-local flag and if it was set it would allocated new objects out of a thread-local pool, then when these types of threads completed there would be no need to garbage collect any allocations made out of that pool. One could just restart the pool.

But this is way more work than just restarting Guile every once in a while, I agree. There may also be other problems with this approach. It is a very general solution to what is likely to be a very common issue in the future, so it may be worth looking at in the future.

So on to your objection to restarting Guile.

The problem with this proposal is that pretty much everything runs through guile.  All the atoms go in through it, and come out through it.  So shutting down guile and restarting it is tantamount to shutting down the system, and restarting it.  Which is OK, if you saved all the atoms you care about to the database.

I don't see why shutting down Guile is tantamount to shutting down the system and restarting it. Right now, the atom space is created first, then the SchemeEval object is created in CogServer::CogServer. So I don't see why we couldn't have a function that shutdown Guile releasing all it's memory back to the OS, and then restarted a new Guile interpreter without destroying the AtomSpace, upon restart you'd pass in the same atom space that existed before the shutdown.

Now, due to some of the quirks of the GC, and perhaps Guile's interaction with it, shutting down and restarting is a bit tricky. You've got the issue with the infinite sleeping initialization thread noted in https://github.com/opencog/atomspace/blob/master/opencog/guile/SchemeEval.cc#L241-L285

You've also got a few scattered SCM static globals scattered about that would have to be cleared and reloaded. It's a bit of work, sure, but doable.

The cleanest way would be to call the appropriate destructors to free up the memory allocated by Guile and the GC. Then unload both libguile.so  and libgc.so and reload them, and then redo the initialization. 

Still, I'm not sure any of this is what we should be doing now since the issue at hand can be most easily handled through other means.

On Tue, Jun 6, 2017 at 3:10 AM, Linas Vepstas <linasv...@gmail.com> wrote:

Ben Goertzel

unread,
Jun 5, 2017, 11:32:20 PM6/5/17
to Linas Vepstas, Curtis Faith, Ruiting Lian, Amen Belayneh, opencog
For the first stages of the language learning process, as I understand
it, what we really need to do is just


***
1. Split input into sentences (which can be done with lots of
sentence-splitters, including ours)

2. For each sentence S:

-- do some stemming (again that can be done with lots of stemmers,
including our own), so that each word is associated with a stem
-- identify each pair of words (W1, W2) so that W1 occurs before W2 in S
-- create and update some Atoms based on the pair (W1, W2)
***

The Atoms to be created are something like

-- an EvaluationLink indicating a link-parser AnyLink between W1 and W2
-- updating a couple counts based on Linas's recent code

I'm in too much of a hurry to look up the format of these links right
now but Ruiting probably remembers or I could look it up this
afternoon..

That would do it right?

The current approach replaces the "identify each pair of words" step
with "do a random link parse", but it is not clear that doing a random
link parse is actually better; my own feeling is that it probably
isn't, but we debated this extensively before without resolution....
To replicate the current pipeline better one would replace

-- identify each pair of words (W1, W2) so that W1 occurs before W2 in S

with

-- do a random link parse, then identify each pair of words (W1,W2)
that are linked in the random link parse

...

Am I missing something?

The above could all be done in C++ perfectly well; it doesn't require
Guile because it doesn't require any of the fancy stuff in the current
NLP pipeline...

-- Ben

Curtis Faith

unread,
Jun 6, 2017, 3:03:01 AM6/6/17
to Ben Goertzel, Linas Vepstas, Ruiting Lian, Amen Belayneh, opencog
Linas wrote: 
 
1) I fixed the hang. :-) Or figured out how to avoid it. It turns out that, due to a stoopid bug/oversight, quotation marks in the input text were not being escaped. Thus, guile would see a begin-quote, end-quote, followed by garbage. A backtrace would be generated, and passed back to the witless user, who is just netcat, can couldn't give a damn, and so was silently ignored.   Escape the quotation marks correctly, the problem goes away.

Cool. 

Puzzling how this but might result in the fp in the garbage collector getting screwed up. Did you find the specific bug in the GC or Guile's use of it that causes the fp to get screwed up, so we can help the developers get that fixed? A problem in scheme source syntax or variables not being spelled correctly shouldn't result in an infinite loop in the GC in any case. Seems like they must be missing some sort of exception handler somewhere.

2) how are you measuring GC time? I also get 70% but i also get 500% cpu time for the other 181 active threads, so 70/500 seems acceptable to me.   Again, GC halts only guile, it does not halt the atomspace.

I was simply using the overall duration of the test as measured by the perl script and the result of (gc-run-time) as measured via a telnet session running in another bash terminal.  My measurements may be in error if my assumption about the GC is incorrect as I've hinted a couple of times in prior emails. If anything, however, I'm over counting the total time and undercounting the percentage since I'm not getting CPU measurements for duration on the CogServer, I'm counting the time until the test is finished as determined by the perl script.

My base assumption is that 'gc-run-time' is time when the other Guile threads are blocked. I make this assumption because of the way that ' gc_time_taken' statistic in Guile is generated via hooks into the GC's before_gc and after_gc hooks that get called before and after each stop-the-world collection. So my assumption is that since the stop-the-world code suspends all the threads that can be garbage collected, and that since all the CogServer generated threads created in response to an "scm hush (observe-text 'Some sentence'))" are threads that are stopped since they allocate objects in the GC via Guile, that therefore the GC time reported is non-overlapping with the other processing time.

If a test takes 100 seconds, and 50 seconds are spent in the GC, during those 50 seconds, there is no work going on in any non-GC threads because they are suspended during the duration of the stop-the-world collection. I have looked at the code to verify these assumptions but it is certainly possible that I am missing something.

How many CPUs are on your test machine? Does it have hyper-threaded Intel chips? If so, you don't tend to get accurate measures of real-time CPU performance on those chips.

You have to reduce the multi-threading process CPU time by a factor of 0.6 to 0.65 to more accurately reflect the 120% to 130% of CPU core that is available for a hyper-threaded core pair, it's not 200% even though it reports that way. So if you've got 70 units time in a single-thread blocking all the others and a reported 500 units total, the non-blocking multi-threading time is 500-70 or 430 units. Multiply by 0.6 to 0.65 to account for over-reporting on hyper threaded chips and you get 258 to 280. Now divide 258 to 280 by the number of CPUs as reported by the OS (all hyper threading units) and you should be close to 30% of the elapsed time. NOTE: even the 70% reported for the gc time may be overstated by some factor, though since there are plenty of empty cores for OS and other tasks while the GC is running, it is likely that the GC gets full use of a core for most of its duration on a machine that isn't already taxed with other processes (like Postgres asynchronous dumps into the AtomSpace). My tests include no such additional work as I've turned off the store-atom and fetch-atom tests.

See: http://perfdynamics.blogspot.hk/2014/01/monitoring-cpu-utilization-under-hyper.html for a bit more on why the CPU times are wrong on Intel chips and linux.

Curtis Faith

unread,
Jun 6, 2017, 5:13:14 AM6/6/17
to Ben Goertzel, Linas Vepstas, Ruiting Lian, Amen Belayneh, opencog
1) I fixed the hang. :-) Or figured out how to avoid it. It turns out that, due to a stoopid bug/oversight, quotation marks in the input text were not being escaped. Thus, guile would see a begin-quote, end-quote, followed by garbage. A backtrace would be generated, and passed back to the witless user, who is just netcat, can couldn't give a damn, and so was silently ignored.   Escape the quotation marks correctly, the problem goes away.

Hmmm, I looked and this is NOT the problem for my testing of Pride and Prejudice. 

There is not a single normal quote " in all the text. All quotes in Pride and Prejudice are actual left-right pairs, i.e. unicode “” not ASCII ""

Linas Vepstas

unread,
Jun 7, 2017, 1:11:49 AM6/7/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
On Mon, Jun 5, 2017 at 10:22 PM, Curtis Faith <curtis....@gmail.com> wrote:
Linas wrote:
 
OK. So lets think this through. The only place where GC is being used is in guile; GC is not being used in the atomspace itself. So you could accomplish exactly the same thing by periodically shutting down guile, completely. This would release all that memory, and then you are done.
 
 <snip>
 
But I don't see any way of implementing a pool, without fully shutting down guile; and if one fully shuts down guile, then one doesn't need a pool.

Okay. So here's what I think it happening now in the CogServer, please correct me if this is wrong:

CogServer receives 3 requests, then CogServer creates three new threads:

    'scm (observe-text "Test sentence one. )"'
        Thread one

    'scm (observe-text "Test sentence two. )"'
        Thread two

    'scm (observe-text "Test sentence three. )"'
        Thread three

If Guile's GC, the bwdgc, were to be altered to check a thread-local flag and if it was set it would allocated new objects out of a thread-local pool, then when these types of threads completed there would be no need to garbage collect any allocations made out of that pool. One could just restart the pool.

But you'd totally corrupt memory, because all three threads are sharing zillions of unknown, opaque SCM structures. in utterly unknown ways. For example, all three share the code for "observe-text".   And observe-text is updating god-knows what closures where ... I'm pretty sure I'm using closures to collect and report progress statistics.   And then there are all to SCM's that are embedded directly in the C++ code, holding things like the current output port, the current return value, etc. 

But this is way more work than just restarting Guile every once in a while, I agree. There may also be other problems with this approach. It is a very general solution to what is likely to be a very common issue in the future, so it may be worth looking at in the future.

So on to your objection to restarting Guile.

The problem with this proposal is that pretty much everything runs through guile.  All the atoms go in through it, and come out through it.  So shutting down guile and restarting it is tantamount to shutting down the system, and restarting it.  Which is OK, if you saved all the atoms you care about to the database.

I don't see why shutting down Guile is tantamount to shutting down the system and restarting it. Right now, the atom space is created first,

Ehh? Where?  Right now, guile is started first, then the opencog module is loaded, and this module creates the atomspace.  If you leave guile, you get the bash prompt; there is no atomspace any more; theres no running executable any more, after you leave guile.
 
then the SchemeEval object is created in CogServer::CogServer.

That's incorrect. The SchemeEval object is created when you (use-modules (oepncog exec))  I typically create this before running the cogserver.  In many of my scripts, I don't bother to start the cogserver, as there is no real need for it. Its kind-of a legacy.  I've been vaguely planning on getting rid of it, since it mostly doesn't do anything.
 
So I don't see why we couldn't have a function that shutdown Guile releasing all it's memory back to the OS,

You can shutdown the cogserver -- there's a function for that .. its called (stop-cogserver) but you execute that function inside of guile.  It leaves the atomspace intact.  It also leaves SchemeEval intact. 
 
and then restarted a new Guile interpreter without destroying the AtomSpace, upon restart you'd pass in the same atom space that existed before the shutdown.

Now, due to some of the quirks of the GC, and perhaps Guile's interaction with it, shutting down and restarting is a bit tricky. You've got the issue with the infinite sleeping initialization thread noted in https://github.com/opencog/atomspace/blob/master/opencog/guile/SchemeEval.cc#L241-L285

You've also got a few scattered SCM static globals scattered about that would have to be cleared and reloaded. It's a bit of work, sure, but doable.

Ask Ben about this. I spent a miserable week in NYC doing this "doable" thing, and I hope I never have to go there again. That there is some truly nasty, complicated code.

Linas Vepstas

unread,
Jun 7, 2017, 1:37:58 AM6/7/17
to Ben Goertzel, Curtis Faith, Ruiting Lian, Amen Belayneh, opencog
On Mon, Jun 5, 2017 at 10:32 PM, Ben Goertzel <b...@goertzel.org> wrote:
For the first stages of the language learning process, as I understand
it, what we really need to do is just


***
1. Split input into sentences (which can be done with lots of
sentence-splitters, including ours)

Except that they all suck, and when you look around, there aren't really that many of them. The one we have is not great, but its the simplest, smallest, best I could find. 

Ben, you have this habit of trivializing everything. Theoretically, yes, sentence-splitting is one of those trivializable problems, but when you have to actually do it, and make things work: it takes time and effort.  Over the last ten years, I've worked with 5 or 6 different sentence splitters, and I've devoted maybe a month of my times screwing with them: downloading them, installing, testing, plugging them into the infrastructure, fixing the inevitable bugs that manifest, then having to re-run  things after fixing the bugs.  and then sitting there, with a blank brain, staring at the screen, waiting to see if it crashes again. And when it doesn't you go pee, come back, check the logs, cross your fingers, and check again in a few hours.

A *month* is a long time to devote to sentence splitting.  But that's just how it is.  To you it seems all very easy, And this is the classic programmer pitfall: imaging something takes only one afternoon.  But it never ever does.

The actual practice is both time-consuming and tediously boring. So I kind of resent it when you talk about how trivial this is and how it can be done anywhere by anything.  
 

2. For each sentence S:

-- do some stemming (again that can be done with lots of stemmers,
including our own), so that each word is associated with a stem

There is absolutely no stemming done.  Its fundamentally incorrect to stem.
 
-- identify each pair of words (W1, W2) so that W1 occurs before W2 in S

Only if you do clique counting. We argued about clique counting already. It sucks.  anyway, its pointless, because the goal  is to parse.
 
-- create and update some Atoms based on the pair (W1, W2)

Yes ...
 
***

The Atoms to be created are something like

-- an EvaluationLink indicating a link-parser AnyLink between W1 and W2
Yes.
-- updating a couple counts based on Linas's recent code

Yes.

I'm in too much of a hurry to look up the format of these links right
now but Ruiting probably remembers or I could look it up this
afternoon..

That would do it right?

Do what? 

The current approach replaces the "identify each pair of words" step
with "do a random link parse", but it is not clear that doing a random
link parse is actually better; my own feeling is that it probably
isn't, but we debated this extensively before without resolution....
To replicate the current pipeline better one would replace

-- identify each pair of words (W1, W2) so that W1 occurs before W2 in S

Well, you've got to add some kind of length limits.  How far apart do you want the words to be?
 

with

-- do a random link parse, then identify each pair of words (W1,W2)
that are linked in the random link parse

Well, the idea is that the parses get less and less random, over time. So doing length-limited clique counting is .. pointless.   Its just make-work for someone to write the code.  Don't create make-work for no reason.

Again: see my tirade about how something is theoretically trivial, but can be excruciatingly time-consuming in practice.

...

Am I missing something?

The above could all be done in C++ perfectly well; it doesn't require
Guile because it doesn't require any of the fancy stuff in the current
NLP pipeline...

What's the point? Why bother? why would you want to do this? what does it accomplish? what does it solve?

--linas

Linas Vepstas

unread,
Jun 7, 2017, 2:02:27 AM6/7/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
On Tue, Jun 6, 2017 at 2:02 AM, Curtis Faith <curtis....@gmail.com> wrote:
Linas wrote: 
 
1) I fixed the hang. :-) Or figured out how to avoid it. It turns out that, due to a stoopid bug/oversight, quotation marks in the input text were not being escaped. Thus, guile would see a begin-quote, end-quote, followed by garbage. A backtrace would be generated, and passed back to the witless user, who is just netcat, can couldn't give a damn, and so was silently ignored.   Escape the quotation marks correctly, the problem goes away.

Cool. 

I was wrong.  It still happens, and it happens just as often as before.  So this wasn't it.
 

Puzzling how this but might result in the fp in the garbage collector getting screwed up. Did you find the specific bug in the GC or Guile's use of it that causes the fp to get screwed up, so we can help the developers get that fixed?

I have been fed up with this conversation for a long time. There's no bug in the GC. I don't understand why you keep saying that, or what evidence you have for that.

Clearly, there are bugs in guile.  I guess. Maybe. Probably. Not sure. All I know is that memory gets corrupted.  Don't know where, why, who is doing the corruption. Could be anyone, anything. Could be the stupid boost C++ library that handles socket I/O. Who knows.
 
A problem in scheme source syntax or variables not being spelled correctly shouldn't result in an infinite loop in the GC in any case. Seems like they must be missing some sort of exception handler somewhere.

2) how are you measuring GC time? I also get 70% but i also get 500% cpu time for the other 181 active threads, so 70/500 seems acceptable to me.   Again, GC halts only guile, it does not halt the atomspace.

I was simply using the overall duration of the test as measured by the perl script and the result of (gc-run-time) as measured via a telnet session running in another bash terminal.  My measurements may be in error if my assumption about the GC is incorrect as I've hinted a couple of times in prior emails. If anything, however, I'm over counting the total time and undercounting the percentage since I'm not getting CPU measurements for duration on the CogServer, I'm counting the time until the test is finished as determined by the perl script.

Sounds like you're counting wall-clock time, not cpu-time.  So that is misleading you.
 

My base assumption is that 'gc-run-time' is time when the other Guile threads are blocked. I make this assumption because of the way that ' gc_time_taken' statistic in Guile is generated via hooks into the GC's before_gc and after_gc hooks that get called before and after each stop-the-world collection. So my assumption is that since the stop-the-world code suspends all the threads that can be garbage collected, and that since all the CogServer generated threads created in response to an "scm hush (observe-text 'Some sentence'))" are threads that are stopped since they allocate objects in the GC via Guile, that therefore the GC time reported is non-overlapping with the other processing time.

If a test takes 100 seconds, and 50 seconds are spent in the GC, during those 50 seconds, there is no work going on in any non-GC threads because they are suspended during the duration of the stop-the-world collection. I have looked at the code to verify these assumptions but it is certainly possible that I am missing something.

You are looking at wall-clock time, not cpu time.
 

How many CPUs are on your test machine?
24
 
Does it have hyper-threaded Intel chips?
No.
 
If so, you don't tend to get accurate measures of real-time CPU performance on those chips.
 
 what? Is this some bug with, what, Xeon's? I don't beleive that, that is a very seriious bug.  I don't beleive Intel would ship a chip that would do that. I mean, you'de have trouble booting an OS on it, since well, at least Linux turns off the clock very early in the boot process, and excepts the CPU do do the right thing.   This got done to save power (battery on laptops, but also wasted cpu cycles on virtualized mainframes)

Below it sounds like you describe a serious bug. I exepct intel chips to report correct time, even if they are hyper-threaded. Is this a Windows bug, maybe?

hyper-threading is a cheap stunt to try to keep the cpu busy while waiting for data to arrive from RAM.  So part of what you say below is correct, but I would expect the chip not to lie, and claim it spent 100% of clock cycles hyperthreading, when it only spent 10% of it's time there.

--linas

Ben Goertzel

unread,
Jun 7, 2017, 2:43:31 AM6/7/17
to Linas Vepstas, Curtis Faith, Ruiting Lian, Amen Belayneh, opencog
On Wed, Jun 7, 2017 at 1:37 PM, Linas Vepstas <linasv...@gmail.com> wrote:
>
> Well, you've got to add some kind of length limits. How far apart do you
> want the words to be?

That's a parameter one can set, it could be n=10 or n=20 ... a while
ago we passed around some papers indicating the length of an average
dependency link in English, Chinese and other languages...

>> The above could all be done in C++ perfectly well; it doesn't require
>> Guile because it doesn't require any of the fancy stuff in the current
>> NLP pipeline...
>
>
> What's the point? Why bother? why would you want to do this? what does it
> accomplish? what does it solve?

It would get us a way of doing Step 0 of the language learning pipeline that

-- works smoothly without hitting weird hard-to-solve Guile bugs and such

-- grabs possible dependencies in a way that seems less wasteful to me
than generating a bunch of random parses (i.e. just building links btw
words that are reasonably near each other)

Obviously this is not the interesting part of the language learning
algorithm ... but we need to get this first part working reliably and
reasonably rapidly to get to the other parts, right? So far Ruiting
hasn't been able to get to the point of experimenting with clustering
and disambiguation algorithms because the Step 0 code hits these Guile
bugs. But there's no need to have complex code or Guile code for Step
0, all we need is simpler stuff...

-- Ben

Linas Vepstas

unread,
Jun 7, 2017, 4:59:03 AM6/7/17
to Ben Goertzel, Curtis Faith, Ruiting Lian, Amen Belayneh, opencog
I've got databases so large that they don't fit into RAM, and I've crashed my system several times because of this. I've had to regenerate them over an over, as I've discovered and patched various bugs and/or experienced data loss in several different ways. e.g. the wikipedia vs. gutenberg issue, several thunderstorms that scrambled SQL cause I ahdn't configured it for safety :-( stuff like that.  And I've done this for multiple languages... So I find it hard to believe that you're stuck on step zero.  All of my issues are elsewhere.

I can send you a medium-sized dataset. Its got some issues, still, but its usable.  Its ready to go, its got 8M distinct word pairs observed 270M times, and 7M distinct connector sets, observed 260M times.   It's small enough that it loads fairly fast, and you can load all of it, if you wish, it probably fits in within 64GB RAM, maybe

--linas

Curtis Faith

unread,
Jun 7, 2017, 11:09:10 PM6/7/17
to Linas Vepstas, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
I wrote:

I don't see why shutting down Guile is tantamount to shutting down the system and restarting it. Right now, the atom space is created first, 

Linas wrote:
Ehh? Where?  Right now, guile is started first, then the opencog module is loaded, and this module creates the atomspace.  If you leave guile, you get the bash prompt; there is no atomspace any more; theres no running executable any more, after you leave guile.
 
I wrote:
then the SchemeEval object is created in CogServer::CogServer.

Linas wrote:
That's incorrect. The SchemeEval object is created when you (use-modules (oepncog exec))  I typically create this before running the cogserver. 

Okay, so you're starting the Guile shell first and using it to launch the CogServer. Obviously that makes it much harder to restart Guile without losing context. What I said above is correct when one starts the CogServer via the linux command line.

Starting the CogServer first is the much easier way to go if one is looking to restart Guile itself.

There's no bug in the GC. I don't understand why you keep saying that, or what evidence you have for that.

What I said was: "Did you find the specific bug in the GC or Guile's use of it that causes the fp to get screwed up?"

I don't know what the bug is. I know it is showing up for me during garbage collection, so I've been referring to it as a GC bug for shorthand, if you don't like that, I'll stop. 

You've found the symptom which causes the hang, as have I, an infinite loop in Guile's vm.c code but for me this loop is always happening during garbage collection. I don't have any evidence that it is a bug in the bwdgc code, indeed it seems much more likely to be in Guile, but it does seem to be related to the intersection of Guile and it's use of the GC as perhaps the GC is portion of the code that relies on Guile not having an infinite loop while holding a GC lock. I don't know. I do know that Guile is doing lots of tricky things with frames for garbage collection purposes, since many of not most of the references to objects the GC will be scanning will be in the various stacks, indeed there are two fields in the frame element union specifically allocated to GC purposes. 

So that's why I have been calling this a GC bug. If you prefer, I can refer to it as the Frame Pointer Infinite Loop Bug from this point forward.

Still, in my experience, the fact that my hangs are always happening during GC means something. Perhaps it is a red herring, and some random overwrite that has nothing to do with Guile's interactions with bwdgc. It's just a clue, not the answer.

The real question is what is the root cause for the frame pointer of a Guile VM stack being incorrect? 


How many CPUs are on your test machine?
24

Sounds like you're counting wall-clock time, not cpu-time.  So that is misleading you.

The GC time I measured is CPU time as reported by Guile's (gc-run-time). The elapsed time is wall-clock time, sure, I know that. In fact, I'm going to argue that it is the most relevant measure.

The perl script you or someone else wrote to send text to the CogServer, send-one.pl, measures elapsed clock time. I've been using that since that is the number I care about, i.e. how long does it take to parse and put this group of sentences into the AtomSpace?

Since the GC time is time when the other threads are suspended, there is a one-to-one correspondence between changes in time spent in the GC and changes to elapsed time as measured by the wall clock.

There is no other OpenCog analysis or AtomSpace work going on while the GC is running. If you are measuring 500 total CPU time and you have 24 CPUs that means you've got 24 potential worker CPUs for time when the GC is not running a stop-the-world collection, and only one while it runs the stop-the-world collection. 

Consider a simplified case:

If total work takes 100 CPU-seconds and you have 10 CPUs and GC time is zero, then you get 100 CPU-seconds / 10 CPUs for 10 seconds of elapsed time.

If GC time to cleanup the objects created by the 10 CPUs is 1 CPU second, then you have 1 second GC time PLUS 10 seconds for work or 11 seconds elapsed time, and the GC is contributing 1 / 11 or 9% to the elapsed time, not the 1% you'd get by dividing by the total CPU time 1 / 101.

If GC time is 5 seconds then you have 5 seconds of GC time plus 10 seconds for work or 15 seconds elapsed time and the GC is contributing 5 / 15 or 33.3% to the elapsed time, not the 5 / 105 or 4.8% time spent in GC which seems acceptable.

If GC time is 15 seconds, you get 15 / (15 + 10) or 60% time in GC, but the CPU time looks like 15 / 115 or 13% which seems like a high but acceptable number.

In your case, the fact that 23 CPUs are standing idle and one is working while the GC is running may mean that the GC is only spending 70/500 of the CPU time in your measurements, but the percentage of GC contribution to the elapsed time is: 

GC Time / (GC Time + ((Total CPU time - GC Time) / number of CPUs working) + other out-of-process work latencies) 

which in the worst case could be something approaching: 70 / (70 + (430 / 24)) or 79.6%.

Keep in mind that I'm not doing any calls to Postgres at all for these tests, so my percentages above probably high for that reason alone. Since you do fetch-atom and store-atom calls, it is probable that your actual percentage contribution is lower, perhaps much lower, than this 79.6%. I don' know how much time any synchronous reads and writes contribute to overall elapsed time in your case.

Finally, my point is that if you want to know how much the GC code is impacting performance, then GC time / elapsed wall-clock time is the number you actually care about since the GC suspends other working threads and any work that reduces time spent in stop-the-world garbage collection will result in a corresponding one-to-one reduction in elapsed wall-clock time.

Linas Vepstas

unread,
Jun 8, 2017, 12:00:33 AM6/8/17
to Curtis Faith, Ben Goertzel, Ruiting Lian, Amen Belayneh, opencog
Look,

I'm don't think we are making forward progress.

1) if you start the cogserver first, then all you have is that crappy telnet interface to guile, which sucks. I tried to make it nice, but having to use it still sucks.  So I mostly avoid it.

2) the prame-pointer loops is the *symptom*. you don't know what the bug is. Its like calling "pain" an illness: its not, its the symptom of an illness.

3) You're obsessing with GC because you can measure it. If instead, you had high-quality mesurements for the amount of cpu-time wasted on reference counting, you'd obsess with that.  Or the amount of time wasted waiting on a lock to insert an atom into the atomspace, you could obsess about that. Or the amount of time wasted, waiting for an atom to the SQL server.  I suspect all of those times are worse than GC.

I particular, I suspect that the reason the computatiion does run on all 24 cpus is because most of the time is wasted waiting to get the atomspace lock. But I don't have hard data for that.  Getting hard data for that has not been a priority.

One of my computations gets slower and slower the bigger the atomspace gets, and its so disasterous, I've never gotten past about getting 40% done with the computation.  The performance drops to a crawl when the atomspace gets more than about 2M atoms in it.  I have just no clue why.  The amount of time spent in GC is stable, its about 1/10th of the time spent in the computation.  The amount of cpu time consumed is stable, so I don't think its stalled on a lock. The amount of sql i/o drops and drops as the amount of computation drops. I have no clue where the bottleneck is, and its driving me crazy.

The point is, there are bottlenecks all over the place, just everywhere.  Maybe they're in guile, hard to tell.

Anyway, there is a way of avoiding guile entirely: perform all of the parsing in the same address space as the atomspace; that way, guile never gets involved. Done right, this wouldn't be a bad thing, but the way Ben talks about it, it makes me nervous that it will be a horrid hack job that I'll get to fix, instead of doing something fun.  I get tired of taking out other people's garbage, and I'm just very nervous that we'll generate a lot of garbage here.

--linas

Ben Goertzel

unread,
Jun 8, 2017, 12:05:19 AM6/8/17
to Linas Vepstas, Curtis Faith, Ruiting Lian, Amen Belayneh, opencog
On Thu, Jun 8, 2017 at 12:00 PM, Linas Vepstas <linasv...@gmail.com> wrote:
> Anyway, there is a way of avoiding guile entirely: perform all of the
> parsing in the same address space as the atomspace; that way, guile never
> gets involved.

Yes, that is what I have suggested to Curtis and Rui Ting...

> Done right, this wouldn't be a bad thing, but the way Ben
> talks about it, it makes me nervous that it will be a horrid hack job that
> I'll get to fix, instead of doing something fun. I get tired of taking out
> other people's garbage, and I'm just very nervous that we'll generate a lot
> of garbage here.

Well, apart from code-quality issues, the semi-substantive issue is
whether -- for the zero stage -- to use random link parses or just
build AnyLinks between words that are reasonably nearby each other in
the input sentence.... You prefer the "random link parse" approach,
Ruiting and I prefer the "words that are reasonably nearby in the
sentence" approach... but I guess we both agree this is a small matter
in the context of the overall language learning pipeline...

...

These weird Guile bugs may well have implications for other use-cases
besides this language learning pipeline though, that's another
issue...

ben
Reply all
Reply to author
Forward
0 new messages