unusual link generator behavior (MSVC and language=en)

17 views
Skip to first unread message

Dennis Grubb

unread,
Jan 13, 2022, 1:50:58 AM1/13/22
to link-grammar

13 January 2022

Hi Amir/All

I've compiled the most recent solutions including LinkGenerator in MS Visual Studio, on 6 Jan 2022

I get results with the language =de, but NONE with language =en.

I could be in error with syntax, but here below is the command line outputs for the 2 variations, followed later by a verbose variation.

D:\link-grammar-master_5Jan_Gen\msvc\Win32\Release>LinkGenerator.exe --language=de --count=6 --length=6 --random
#
# Corpus for language: "de"
# Sentence length: 6
# Requested corpus size: 6
link-grammar: Info: Dictionary found at D:\link-grammar-master_5Jan_Gen\msvc\Win32\Release\..\..\..\data/de/4.0.dict
# Dictionary version 5.9.0
# Number of categories: 177
# Linkages found: 299818
# Linkages generated: 6
# Number of unused disjuncts: 1567
#
LEFT-WALL und ein Fenster helfe etwas Kaese RIGHT-WALL
LEFT-WALL laufe ich morgen sehr schnell auf RIGHT-WALL
LEFT-WALL sie hatten gesprochen , sprechen ? RIGHT-WALL
LEFT-WALL versuchte ich heute , kommen . RIGHT-WALL
LEFT-WALL glaubten einige als du schnell sagst RIGHT-WALL
LEFT-WALL gestern hatte Kaese gesagt , laufen RIGHT-WALL
# Bye.

AND

D:\link-grammar-master_5Jan_Gen\msvc\Win32\Release>LinkGenerator.exe --language=en --count=6 --length=6 --random
#
# Corpus for language: "en"
# Sentence length: 6
# Requested corpus size: 6
link-grammar: Info: Dictionary found at D:\link-grammar-master_5Jan_Gen\msvc\Win32\Release\..\..\..\data/en/4.0.dict
# Dictionary version 5.10.2
# Number of categories: 1716
.... minutes go past, then back to the prompt
D:\link-grammar-master_5Jan_Gen\msvc\Win32\Release>


If I compile for x64, and do the same as above, it drives my CPU to max and memory usage to near max (16GB) and takes about 2 minutes to finish, but still with NO output.

I have also tried the Sentence Template using "\*" as the wildcard but still no output.

below are the results using --verbosity=5 and sentence template "This \* a \* ."

D:\link-grammar-master_5Jan_Gen\msvc\Win32\Release>LinkGenerator.exe --language=en  --count=6 --length=0 --verbosity=5
#
# Corpus for language: "en"
# Requested corpus size: 6
link-grammar: Info: Dictionary found at D:\link-grammar-master_5Jan_Gen\msvc\Win32\Release\..\..\..\data/en/4.0.dict
# Dictionary version 5.10.2
# Number of categories: 1716
linkgenerator> This \* a \* .
# Sentence template: This \* a \* .

#### Finished tokenizing (7 tokens)
++++ Finished expression pruning                 0.02 seconds
Debug: After expanding expressions into disjuncts:
LEFT-WALL(8325) this(9) \*(117064) a(13) \*(305275) .(5) RIGHT-WALL(3)
Total: 430694 disjuncts, 1810911 (493146+/1317765-) connectors

++++ Built disjuncts                             0.11 seconds
Trace: eliminate_duplicate_disjuncts: w0: Killed 636 duplicates
Trace: eliminate_duplicate_disjuncts: w2: Killed 7387 duplicates
Trace: eliminate_duplicate_disjuncts: w2: Killed 83020 duplicates (different word-strings)
#### Creating a wild-card word disjunct list
#### Finished tokenizing (3 tokens)

.... minutes go past, then back to the prompt again with NO output

Anybody else see this/these result/s?

Dennis in Tasmania


ami...@gmail.com

unread,
Jan 13, 2022, 6:13:04 AM1/13/22
to link-grammar
Hello Dennis,

For me, there is a result in a short time.

Currently, the LG library doesn't check for memory allocation failures and maybe your virtual memory has been exhausted.
In general, the OS is then supposed to notify a fault due to a try to dereference a NULL pointer.
But here you don't get an error message.
The stack memory can also be exhausted in principle (although this is not expected during the duplicate disjunct elimination step), but I tested your link-generator command-line (on Linux) with a 200k stack and it succeeded (the default on Windows is 1M).

I any case, you can see how much virtual memory you have, using this cmd command:

systeminfo | find/i "virtual memory"

(In my Windows I get: Virtual Memory: Available: 15,747 MB.)

Amir

Linas Vepstas

unread,
Jan 15, 2022, 6:18:02 PM1/15/22
to link-grammar
If you have lots of browser windows open, it can eat up a lot of RAM.  But also, I would be disappointed to learn that link-generator used more than 2-4GB RAM during generation. And even that much seems fairly extreme.  Controlling combinatorial explosion is, as always, a fundamental challenge.

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/0cb1ac7e-417a-40da-af66-411183a3796bn%40googlegroups.com.


--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
 

Reply all
Reply to author
Forward
0 new messages