WinLinkGrammar now supports Corpus Statistics

111 weergaven
Naar het eerste ongelezen bericht

Bill Hayes

ongelezen,
6 jul 2011, 21:38:0206-07-2011
aan link-g...@googlegroups.com
FYI ...

I have configured a site at https://LaunchPad.net/WinLinkGrammar to host a
Windows executable of the link parser code as well as the required regex2.dll.
The code I released last week did not include the /corpus code.

The code I released today does now support Corpus Statistics by including the
/corpus code as well as embedding the SQLite3 db engine (an external dll is
not needed).  The 'disjuncts.db' file is still a separate download.

However, Windows user can now install and run Link Grammar by doing just three
downloads - no compiling necessary.

All of the source code and project files are also available at the site.
I annotated every line that I had to modify to get it to compile under MSVC++ 2010.

When I parse the example sentence from the documentation, 'this is a test' I get
essentially the same output that it lists:

linkparser> this is a test
Found 1 linkage (1 had no P.P. violations)
        Unique linkage, cost vector = (CORP=4.4257 UNUSED=0 DIS=0 FAT=0 AND=0 LEN=5)

         +--Ost--+
   +-Ss*b+  +-Ds-+
   |     |  |    |
this.p is.v a test.n

2 is.v dj=Ss*b- Ost+  sense=be%2:40:00:: score=7.746697
2 is.v dj=Ss*b- Ost+  sense=be%2:42:08:: score=7.667323
2 is.v dj=Ss*b- Ost+  sense=be%2:42:13:: score=7.613157
2 is.v dj=Ss*b- Ost+  sense=be%2:42:09:: score=7.188450
2 is.v dj=Ss*b- Ost+  sense=be%2:42:06:: score=7.092098
2 is.v dj=Ss*b- Ost+  sense=be%2:42:04:: score=6.781194
2 is.v dj=Ss*b- Ost+  sense=be%2:42:01:: score=5.976789
2 is.v dj=Ss*b- Ost+  sense=be%2:42:07:: score=5.963879
2 is.v dj=Ss*b- Ost+  sense=be%2:42:00:: score=4.690056
2 is.v dj=Ss*b- Ost+  sense=be%2:41:00:: score=2.632383
2 is.v dj=Ss*b- Ost+  sense=be%2:42:02:: score=2.351568
2 is.v dj=Ss*b- Ost+  sense=be%2:42:05:: score=2.143989
2 is.v dj=Ss*b- Ost+  sense=be%2:42:03:: score=1.699292
4 test.n dj=Ost- Ds-  sense=test%1:04:00:: score=0.000000
               this.p      0.0  0.695 Wd- Ss*b+
                 is.v      0.0  7.355 Ss*b- Ost+
                    a      0.0  0.502 Ds+
               test.n      0.0  9.151 Ost- Ds-
linkparser>

Regards,
Bill Hayes

Stuti Ajmani

ongelezen,
7 jul 2011, 03:33:4907-07-2011
aan link-g...@googlegroups.com
Hi

I added the sqlite3.dll, corpus.c, sqlite.c, sqlite3.h and made a few changes in corpus.c to get rid of the errors. I have even added the database in the correct location. Now if I want link grammar to run corpus statistics, it has to run functions of corpus.c
So from where do I take the control of the code to corpus.c? Secondly I will not be required to call the function sentence_parse() (it is present in link-parser.c) right? I will straight away initialize the corpus statistic subsystem as this point, is it?

I tried running your exe but it always give a weight of 17. This means it is unable to access the corpus statistic database correctly.

PS: Please help with with the solution. Where all do I need to change the code? Actually exe is not helpful for me. I am using link grammar for many sentences. In such a scenario it is not a good idea to run the exe with every new sentence. I have to do it using dll only.

Thank You 
Stuti Ajmani


--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To view this discussion on the web visit https://groups.google.com/d/msg/link-grammar/-/beLMdfdJe8oJ.
To post to this group, send email to link-g...@googlegroups.com.
To unsubscribe from this group, send email to link-grammar...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/link-grammar?hl=en.

Bill Hayes

ongelezen,
7 jul 2011, 15:53:5907-07-2011
aan link-g...@googlegroups.com
Hi Stuti,
Some parses for some sentences return weights of 17 even if it 
found the db. Two of the three parses of 'Bill saw Bob' for example.
Please try 'this is a test' which should return the weights shown in my original post.

If 'this is a test' shows weights of 17, then
find this line after link grammar starts:
  "link-grammar: Warning: Can't open database: File not found"
then the two previous lines will be "object_open() trying --path--disjuncts.db"
My guess (hope) is that you didn't put the 'disjuncts.db' file
where it's looking (check the --path-- it reports) or you didn't 
rename the downloaded file to 'disjuncts.db'
or you didn't put it in an 'sql' folder.

Re your other questions, it sounds like you want link-grammar to:
(a) take a file of sentences instead of one at a time (batch mode)
(b) not graph the sentence linkage
(c) dump the output to a file
Link Grammar has all of these options.

Put your sentences in a file, for example 's.txt', in the same folder 
where link-grammar474.exe is located.
Start a DOS prompt, cd to the directory and run it via:
  link-grammar474.exe -!disjuncts -!senses -!graphics < s.txt > out.txt
It will put sense and disjunct info for all of your sentences (but no linkage graphs)
into the file 'out.txt'

If you start the program by double clicking it you can still turn off graphics via the !graphics  command.
However the !batch command turns on batch mode but I can't figure out how to then pipe a file to it,
so you may have to run it from a DOS prompt to get batch mode.

Good luck,
Bill

Bill Hayes

ongelezen,
1 jul 2012, 20:51:1001-07-2012
aan link-g...@googlegroups.com
Hi Jim,
I've never tried getting source code from Bazaar until today.
Non obvious and non trivial to say the least. 
Here are the instructions, and below them my notes
about what modifications I made to the original source code.
Regard,
Bill Hayes

In summary you have to:
- download and install the Bazaar source code control for Windows
- Get a Branch of WinLinkGrammar
- Click the '/link-grammar/link-grammar.vcproj' icon
or open MS Visual Studio 2010 and select that file.

I just pulled a branch and loaded it with MS VS 2010
without errors.  Pretty fast actually.

Bazaar is the source code control system for LaunchPad.net
(similar to Git or Subersion or CVS).
You don't have to login or have an acct to get a branch.


In detail ...
Click one of the stable download links like 2.5.1
Run the setup.exe file to install Bazaar.

Run the Bazaar Explorer (usually the icon on the desktop).
Click the 'Get project source from elsewhere' bar.
Click the 'Branch' icon (blue folder).
In the Locations 'From' box enter:  lp:winlinkgrammar
In 'To' enter or browse to a dir where you want the code to go.
Click OK.
This will give you warning that 
 'destination is outside a shared repository. Would you like to
 initialize one now?'   
Click Yes.  Then click OK to initialize.
Click 'Close' when it's done.
Now you're back at the Branch window:  
 click 'OK' to get the code.

It takes about 10 seconds before it starts downloading
and a green status bar will appear.
The download is 85MB so it may take a few minutes
(50MB of this is a couple MS SQL .sdf files).
Click 'Close' when it's done.
Close the Bazaar Explorer.

Open the VC++ project by clicking on the
trunk/link-grammar/link-grammer.vcproj file.
(Or open it directly from Visual Studio 2010.)


Pulling a branch resets all of the file
modified dates, but here's what I modified.
(Just do a search for 'BHayes' in the files -
I believe I marked every edit that I made.)

According to my notes I only modified 
utilities.c
to get the basic Link Grammar to work (no Corpus Stats).

To get Corpus Statistics to work I modified:
api.c
api-structures.h
cluster.c
cluster.h
corpus.c
corpus.h
expand.c
link-parser.c
print.c

FYI, here are my notes as I debugged the Corpus Stats problems.
-- corpus.h,  .c --
I  downloaded and moved to /corpus  sqlite3.h and sqlite3.c
corpus.h   I added  #define USE_CORPUS   b/c seems like what we want and w/o it the selected code does not compile
corpus.c   changed #include <sqlite3.h>  to  #include "sqlite3.h"
-- sqlite.c --
added it to project  (did NOT edit .c or .h)
It compiled.
-- cluster.h,  .c --
cluster.h   I added  #define USE_CORPUS
cluster.c   changed #include <sqlite3.h>  to  #include "sqlite3.h"
Huge number of errors due to var types declared throughout code instead of at top of blocks
Also a lot of 'char' instead of 'const char'
Changed '#if USE_CORPUS'  to  '#ifdef USE_CORPUS'
in api.c, print.c, 
Compiled!
#ifdef USE_CORPUS  only used in:  
api.c, api-structures.h, corpus.h, clusters.h
Add  #define USE_CORPUS to 'api-structures.h' and re-compiled.
Project with sqlite3 compiles!
db didn't open (but linkparser still runs and prints some of the lines)

Change   #define DBNAME "sql/disjuncts.db"  in 'corpus.c'
to have backslash  #define DBNAME "sql\disjuncts.db"
recompile,  ugh -   it looked for "sqldisjuncts.db"
because it used '\' as an escape
Change  "sql\disjuncts.db"  to  "sql\\disjuncts.db"  
(double '\' so it does not escape it)

Compiles w/o error.
Runs link-grammar example!
 - - - - - - - - - - - - - - - - 



On Sat, Jun 30, 2012 at 8:26 PM, Jim Adams <jmad...@cox.net> wrote:


On Wednesday, July 6, 2011 6:38:02 PM UTC-7, Bill Hayes wrote:
FYI ...

I have configured a site at https://LaunchPad.net/WinLinkGrammar to host a
Windows executable of the link parser code as well as the required regex2.dll.
...

All of the source code and project files are also available at the site.
I annotated every line that I had to modify to get it to compile under MSVC++ 2010.
...

Regards,
Bill Hayes

Howdy,

I downloaded WinLinkGrammar, and it is great.

Now I am ready to move on to my own "Hello World!" application
using the Link Grammar.  I visited the LaunchPad site, but
could not figure out how to download either the MSVC++ project files
or the source code. 

Is there a good tutorial somewhere that would tell me how to
download the rest of your excellent work?

Jim Adams

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To view this discussion on the web visit https://groups.google.com/d/msg/link-grammar/-/Se1GiKZY9L8J.

Jim Adams

ongelezen,
1 jul 2012, 21:12:2901-07-2012
aan link-g...@googlegroups.com

Howdy,

Thanks for the info.
I'll give it a try tomorrow.

Jim

Jim Adams

ongelezen,
3 jul 2012, 23:01:0303-07-2012
aan link-g...@googlegroups.com
Howdy,

All went well with the download and compile,
except that I don't have a copy of regex.
Which of the many versions that are
available did you use?


Jim

At 05:51 PM 7/1/2012  Sunday, you wrote:

Bill Hayes

ongelezen,
3 jul 2012, 23:33:3703-07-2012
aan link-g...@googlegroups.com
Hi Jim,

The WinLinkGrammar download includes a copy of regex2.dll
which is regex 2.7 from the GnuWin32 set of tools.

Bill Hayes

Brett Nieland

ongelezen,
10 nov 2013, 10:24:5410-11-2013
aan link-g...@googlegroups.com
Hi Bill,

I just compiled your source from Bazaar in MS VS2012 on a 64bit Windows 8.1 machine.

I am returning to C after a 15+ year hiatus, so please excuse the inanities. I am also including a lot of detail in hopes of helping others.

Here are the steps I took to get it to compile.

I included the regex.h and regex.lib from
http://gnuwin32.sourceforge.net/packages/regex.htm
in the link-grammar directory
and the regex2.dll in the debug folder (where my "link-grammar474.exe" file would appear after compilation).

I then modified line 16 from regex-morph.c from
#include <regex.h>
to
#include "regex.h"
I am assuming that if I knew how and where to install the regex2.dll in the VS IDE, this would have been unnecessary

In response to a compiler error:
type "const char *" cannot be assigned to an entity of type "char *"
On line 472 of read-dict.c I changed
char *ds, *dt;
to
const char *ds, *dt;

This yielded a clean compile in debug release mode..

When I run the exe (after moving it to the proper data dir), I now get a debug assertion in a file I can not find!

Debug Assertion Failed!
Program:
......link-grammar474.exe
File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
Expression: c > -1 && c <= 255

Sorry for my ignorance, but is this a problem?

After I ignore the debug assertion, I run your test and get 4 linkages!

Is this a bad sign? Your outbut from above only had one (which seems to match my linkage #3).

Here is the output:

linkparser> this is a test
Found 4 linkages (4 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS=0 FAT=0 AND=0 LEN=6)

+------WV------+--Osm--+
+---Wd---+-Ss*b+ +-Ds-+
| | | | |
LEFT-WALL this.p is.v a test.n

Press RETURN for the next linkage.
linkparser>
Linkage 2, cost vector = (UNUSED=0 DIS=0 FAT=0 AND=0 LEN=6)

+------WV------+--Ost--+
+---Wd---+-Ss*b+ +-Ds-+
| | | | |
LEFT-WALL this.p is.v a test.n

Press RETURN for the next linkage.
linkparser>
Linkage 3, cost vector = (UNUSED=0 DIS=2 FAT=0 AND=0 LEN=5)

+--Ost--+
+-Ss*b+ +-Ds-+
| | | |
this.p is.v a test.n

Press RETURN for the next linkage.
linkparser>
Linkage 4, cost vector = (UNUSED=0 DIS=2 FAT=0 AND=0 LEN=5)

+--Osm--+
+-Ss*b+ +-Ds-+
| | | |
this.p is.v a test.n

linkparser> ^C
C:\Users\Brett\Desktop\Pinball\Grammar\link-grammar-4.8.0\data>s


the output from
Thanks for your time,

Brett
Mezer_11-10_10-23-23.png

Linas Vepstas

ongelezen,
10 nov 2013, 11:24:1210-11-2013
aan link-grammar
Hi Brett,

On 10 November 2013 09:24, Brett Nieland <nie...@gmail.com> wrote:
Hi Bill,

I just compiled your source from Bazaar

Except that link-grammar lives on SVN not BZR. 

in MS VS2012 on a 64bit Windows 8.1 machine.

I am returning to C after a 15+ year hiatus, so please excuse the inanities.  I am also including a lot of detail in hopes of helping others.

Here are the steps I took to get it to compile.

I included the regex.h and regex.lib from
http://gnuwin32.sourceforge.net/packages/regex.htm
in the link-grammar directory
and the regex2.dll in the debug folder (where my "link-grammar474.exe" file would appear after compilation).

the current version is 480 not 474, which is not quite three years old. 

I then modified line 16 from regex-morph.c from
#include <regex.h>
to
#include "regex.h"

This should have no effect.
 
I am assuming that if I knew how and where to install the regex2.dll in the VS IDE, this would have been unnecessary

OK, I'll grant that. Doesn't the thing come with some install script?

In response to a compiler error:
  type "const char *" cannot be assigned to an entity of type "char *"
On line 472 of read-dict.c I changed
char *ds, *dt;
to
const char *ds, *dt;

Clearly an old version: in the current version, this line number is in the middle of a comment!  

This yielded a clean compile in debug release mode..

When I run the exe (after moving it to the proper data dir), I now get a debug assertion in a file I can not find!

  Debug Assertion Failed!
  Program:
  ......link-grammar474.exe
  File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
  Expression: c > -1 && c <= 255

if you were to look at the stack trace, you would see who was calling it; I'm guessing the regex lib is, which says the regex lib is not utf-8 safe.  You might want to try a different regex lib.

Sorry for my ignorance, but is this a problem?

For english, probably not,  For russian,  maybe. 

After I ignore the debug assertion, I run your test and get 4 linkages!

Is this a bad sign?  

That's normal.
 
Your outbut from above only had one (which seems to match my linkage #3).

Here is the output:

linkparser> this is a test
Found 4 linkages (4 had no P.P. violations)
        Linkage 1, cost vector = (UNUSED=0 DIS=0 FAT=0 AND=0 LEN=6)

    +------WV------+--Osm--+
    +---Wd---+-Ss*b+  +-Ds-+
    |        |     |  |    |
LEFT-WALL this.p is.v a test.n


Clearly, you are using the 4.8.0 dicts so the talk of the 474 source code is .. bizarre. A lot of stuff has changed in the C source code over the years: bug fixes, performance improvements.

There are 4 parses because the new WV link causes a roughly doubled number of possible linkages. We are working to remove the duplicates (3 and 4 below), but that will take a while.

I really suggest working with the official sources.  I have never heard of a bazaar repo for link-grammar before. Where is this?


Press RETURN for the next linkage.
linkparser>
        Linkage 2, cost vector = (UNUSED=0 DIS=0 FAT=0 AND=0 LEN=6)

    +------WV------+--Ost--+
    +---Wd---+-Ss*b+  +-Ds-+
    |        |     |  |    |
LEFT-WALL this.p is.v a test.n

Press RETURN for the next linkage.
linkparser>
        Linkage 3, cost vector = (UNUSED=0 DIS=2 FAT=0 AND=0 LEN=5)

         +--Ost--+
   +-Ss*b+  +-Ds-+
   |     |  |    |
this.p is.v a test.n

Press RETURN for the next linkage.
linkparser>
        Linkage 4, cost vector = (UNUSED=0 DIS=2 FAT=0 AND=0 LEN=5)

         +--Osm--+
   +-Ss*b+  +-Ds-+
   |     |  |    |
this.p is.v a test.n

linkparser> ^C
C:\Users\Brett\Desktop\Pinball\Grammar\link-grammar-4.8.0\data>s


the output from




On Sunday, July 1, 2012 8:51:10 PM UTC-4, Bill Hayes wrote:
> Hi Jim,
> I've never tried getting source code from Bazaar until today.

I don't know what the heck this talk of bazaar is about.  This email is more than a year old ...!?
The correct way to do this is to tell the compiler to do it with DUSE_CORPUS  which will cause it to be turned on for ALL files.

b/c seems like what we want and w/o it the selected code does not compile
>
>       corpus.c   changed #include <sqlite3.h>  to  #include "sqlite3.h"
>
>
>       -- sqlite.c --
>       added it to project  (did NOT edit .c or .h)
>       It compiled.
>
>
>       -- cluster.h,  .c --
>       cluster.h   I added  #define USE_CORPUS
>
>       cluster.c   changed #include <sqlite3.h>  to  #include "sqlite3.h"
>       Huge number of errors due to var types declared throughout code instead of at top of blocks

I think this has been fixed.
 
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.

To post to this group, send email to link-g...@googlegroups.com.

Brett Nieland

ongelezen,
10 nov 2013, 11:56:4710-11-2013
aan link-g...@googlegroups.com, linasv...@gmail.com
Linas,

Thank you for taking the time to reply.

As for getting the source from Bazaar, I was following Bill Hayes instructions in this thread (7/1/12) for his MSVC code.

It sounded like he had to make a lot of changes to the official GNU version to get it working in MSVC. As I have never used the GNU tool chain, I thought it best to start there.

I am writing a game in which the player assembles sentences out of a fixed "bag" of 20 words. The sentence they are constructing can only be 6 words long. Thus, I need to be able to test if the sentence they construct is a valid sentence. As this app is written in Unity (C#) my plan is to pre-compile a list of valid sentences given the constraints above. I am betting this will be easier than porting link-grammar to c#!

So, what I need is to determine what output from link-grammar indicates a "valid" sentence.

Can anyone suggest a good place to start?

Thanks again,

Brett

Linas Vepstas

ongelezen,
10 nov 2013, 14:46:1510-11-2013
aan Bill Hayes, link-grammar, Brett Nieland



On 10 November 2013 13:27, Bill Hayes <bhay...@gmail.com> wrote:
Hi Brett,

a) LG tries to provide the most likely valid interpretations of a sentence, not determine what is and is not a valid sentence.

Almost all changes to the dictionary in the last 5 years has been to broaden coverage, which means that it now parses more ungrammatical sentences.   Try the very old dictionaries:  they are far more strict (at the cost of failing to parse some valid sentences).
 
Many grammatically correct sentences are nonsensical.

"colorless green ideas sleep furiously".

--linas 
Allen beantwoorden
Auteur beantwoorden
Doorsturen
0 nieuwe berichten