result format

10 views
Skip to first unread message

Bruce Clay

unread,
Feb 3, 2021, 7:30:22 AMFeb 3
to link-g...@googlegroups.com
Has there been any thought to saving the grammar results in a machine readable format such as xml or JSON?

Bruce

Anton Kolonin @ Gmail

unread,
Feb 3, 2021, 8:32:06 AMFeb 3
to link-g...@googlegroups.com, Bruce Clay

Hi Bruce,

Do you mean the parse graph or the grammar dictionary?

If you mean the parse graph, then xml and JSON are kind of not handy to deal with graphs and subgraphs in particular.

In our "Unsupervised Language Learning (ULL)" project we have used "ull" format for type-less linkages to transfer them along the NLP pipleine discussed in the papers http://langlearn.singularitynet.io/data/docs/, for example:

http://langlearn.singularitynet.io/data/parses/English/Gutenberg-Children-Books/LG5.6.2/lower/parses/11-0.txt.ull

That is, parse for "tuna is a fish" would be:

tuna is a fish
1 tuna 2 is 
2 is 4 fish
3 a 4 fish 

You can extend this format with type, like:
tuna is a fish
1 tuna 2 is Ss
2 is 4 fish Ost
3 a 4 fish Dsu

Referring to:

https://www.link.cs.cmu.edu/link/submit-sentence-4.html

You can wrap the above in xml/json but it won't get readable by human eyes then.

Please keep posted if you end up with something :-)

Best regards,

-Anton

 

On 03/02/2021 19:30, Bruce Clay wrote:
Has there been any thought to saving the grammar results in a machine readable format such as xml or JSON?

Bruce

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHemx50gwypig4mjnS%2BMhyw1qgpSiJ4Y3%2BUDJ_6NvNhihJs-cg%40mail.gmail.com.
-- 
-Anton Kolonin
telegram/skype/facebook: akolonin
mobile/WhatsApp: +79139250058
akol...@aigents.com
https://aigents.com
https://www.youtube.com/aigents
https://www.facebook.com/aigents
https://wt.social/wt/aigents
https://medium.com/@aigents
https://steemit.com/@aigents
https://reddit.com/r/aigents
https://twitter.com/aigents
https://golos.in/@aigents
https://vk.com/aigents
https://aigents.com/en/slack.html
https://www.messenger.com/t/aigents
https://web.telegram.org/#/im?p=@AigentsBot

Bruce Clay

unread,
Feb 3, 2021, 3:56:51 PMFeb 3
to Anton Kolonin @ Gmail, link-g...@googlegroups.com
Anton:
  Thank you for your reply.

I am looking for a machine readable way to extract parts of speech (POS) from text.  The concept appears to be built into Link Grammar but I am not seeing where the POS are defined.

Brue

Arthur Wolf

unread,
Feb 3, 2021, 4:03:00 PMFeb 3
to link-g...@googlegroups.com, Anton Kolonin @ Gmail
Yes, this is definitely something that would be very helpful for anyone trying to integrate link-grammar into a wider project (and I don't think it'd be too difficult to implement in the project itself).
Some years back I did that, and had to go through painful work writing a parser for the link-grammar output that would give me object/json-like ( both the tree structure and the information on each link, and how it linked back to the original sentence ) data to work with, and I really wish link-grammar just had an option to output that by default.
I looked and can't find the parser code unfortunately, it was lost to the sands of time...
I guess if I ran into this issue today, I'd just work this into link-grammar's source code.
So that's one thumbs up from me on how this would be a good feature to have.




--
勇気とユーモア

Anton Kolonin @ Gmail

unread,
Feb 3, 2021, 4:54:13 PMFeb 3
to Arthur Wolf, link-grammar
We should have such LG output parser in the dark depths of https://github.com/singnet/language-learning.

Regarding the POS, not sure if LG concept supports such concept, but I guess one can infer it from the parsed link types. 

Explicit introduction of the POS concept would need some extension of design and dictionary structure - I wonder if Linas and Amir consider this possibility...

-Anton

чт, 4 февр. 2021 г., 4:02 Arthur Wolf <wolf....@gmail.com>:

Linas Vepstas

unread,
Feb 4, 2021, 2:46:28 PMFeb 4
to link-grammar
On Wed, Feb 3, 2021 at 6:30 AM Bruce Clay <bcla...@gmail.com> wrote:
Has there been any thought to saving the grammar results in a machine readable format such as xml or JSON?

Yes. LG comes with a json server built-in. Here: https://github.com/opencog/link-grammar/tree/master/bindings/java see the link-grammar-server.sh file.  All of this comes with the standard link-grammar tarball, and I believe that it is automatically built, if you have java installed. The server even has all the bells and whistles: like 4 or 8 threads (configurable) to do parsing in parallel.

The fact that you (and everyone else who replied) don't know about this means that its under-documented. Not sure where I should put that documentation; where would you look for it?  In the README file? 

> a machine readable way to extract parts of speech (POS) from text.  The concept appears to be built into Link Grammar but I am not seeing where the POS are defined.

Ahh, if you are not careful, you will step on a land-mine.  The LG english dictionary does use "subscripts", such as "saw.v" and "saw.n" which roughly correspond to verbs and nouns. These are documented in the ... documentation. Here: chapter 3.3 "word subscripts"


However, these subscripts are NOT accurate POS labels, and they were never meant to be. They are instead a convenient device to allow the dictionary maintainer to manage the structure of the dict, and due to an unfortunate historical blunder, they were exposed in the API, to the user. This is tragicomic. 

LG DOES generate high-accuracy, extremely detailed POS tags. These are called "disjuncts".  Here's an example:

    +----->WV----->+
    +-->Wd--+-Ss*s-+----Ou---+
    |       |      |         |
LEFT-WALL Jim.m plays.v guitar.n-u

            LEFT-WALL     0.000  hWd+ hWV+ RW+
                Jim.m     0.000  Wd- Ss*s+
              plays.v     0.000  Ss- dWV- O+
           guitar.n-u     0.000  Ou-
           RIGHT-WALL     0.000  RW-


The "plays.v" hints that plays is verb-like. However, "Ss- dWV- O+" is much more accurate. The S- says "its a verb" and anything that has S- on it is a verb. The O+ says "its a transitive verb", and anything with "S- & O+" is a transitive verb. The Ss- says that the subject is singular. (Compare to "we play guitar") The Ou says that the object is indeterminate, so Ss- & Ou+ says "its a transitive verb taking a singular subject and a mass noun object".  The WV says that this is the head-verb of the sentence.  This info is far more detailed than simply knowing "its a verb".

That is, disjuncts are fine-grained, hyper-detailed POS tags.  They are guaranteed to be accurate.  The word subscripts are a dict maintainer convenience  device that kind-of correlates OK with POS, but are not guaranteed to be accurate and complete. I mean, they're not "wrong", its just that they aren't really POS tags.

--linas

Arthur Wolf

unread,
Feb 4, 2021, 2:51:09 PMFeb 4
to link-g...@googlegroups.com
On Thu, Feb 4, 2021 at 8:46 PM Linas Vepstas <linasv...@gmail.com> wrote:


On Wed, Feb 3, 2021 at 6:30 AM Bruce Clay <bcla...@gmail.com> wrote:
Has there been any thought to saving the grammar results in a machine readable format such as xml or JSON?

Yes. LG comes with a json server built-in.

Oh, man that makes me so sad, I paddled completely uselessly coding my own after-the-fact parsing :/
Well, I'll know next time, but yes this should probably be more prominently documented...

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.

Linas Vepstas

unread,
Feb 4, 2021, 2:56:23 PMFeb 4
to link-grammar
Here's a demo of the json server.

$ cd  src/link-grammar-git/bindings/java
$ ./link-grammar-server.sh
Starting Link Grammar Server at port 9000, with 1 available processing threads and  with default dictionary location.
link-grammar: Info: JNI: dictionary language 'en' version 5.8.1


$ telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Jim plays guitar
Connection closed by foreign host.


Oooooops!

java.lang.RuntimeException: Malformed message:Jim plays guitar
Did you forget to say "text:" at the start of the message?
at org.linkgrammar.JSONUtils.readMsg(JSONUtils.java:16
3)

Try again:

$ telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
text: Jim plays guitar
3062
{"numSkippedWords":0,"linkages":[{"words":["LEFT-WALL","Jim.m","plays.v","guitar.n-u","RIGHT-WALL"], "disjuncts":["hWd+ hWV+ RW+","Wd- Ss*s+","Ss- dWV- O+","Ou-","RW-"], "disjunctCost":0.0, "linkageCost":4.0, "numViolations":0, "links":[{"label":"RW","left":0,"right":4,"leftLabel":"RW","rightLabel":"RW"},{"label":"WV","left":0,"right":2,"leftLabel":"hWV","rightLabel":"dWV"},{"label":"Wd","left":0,"right":1,"leftLabel":"hWd","rightLabel":"Wd"},{"label":"Ss*s","left":1,"right":2,"leftLabel":"Ss*s","rightLabel":"Ss"},{"label":"Ou","left":2,"right":3,"leftLabel":"O","rightLabel":"Ou"}]},{"words":["LEFT-WALL","Jim.m","plays.n","guitar.n-u","RIGHT-WALL"], "disjuncts":["hWa+ RW+","AN+","AN+","@AN- Wa-","RW-"], "disjunctCost":2.1200000010430813, "linkageCost":6.0, "numViolations":0, "links":[{"label":"RW","left":0,"right":4,"leftLabel":"RW","rightLabel":"RW"},{"label":"Wa","left":0,"right":3,"leftLabel":"hWa","rightLabel":"Wa"},{"label":"AN","left":1,"right":3,"leftLabel":"AN","rightLabel":"AN"},{"label":"AN","left":2,"right":3,"leftLabel":"AN","rightLabel":"AN"}]},{"words":["LEFT-WALL","Jim.m","plays.n","guitar.s","RIGHT-WALL"], "disjuncts":["hWa+ RW+","AN+","AN+","@AN- Wa-","RW-"], "disjunctCost":2.150000002235174, "linkageCost":6.0, "numViolations":0, "links":[{"label":"RW","left":0,"right":4,"leftLabel":"RW","rightLabel":"RW"},{"label":"Wa","left":0,"right":3,"leftLabel":"hWa","rightLabel":"Wa"},{"label":"AN","left":1,"right":3,"leftLabel":"AN","rightLabel":"AN"},{"label":"AN","left":2,"right":3,"leftLabel":"AN","rightLabel":"AN"}]},{"words":["LEFT-WALL","Jim.m","plays.n","guitar.n-u","RIGHT-WALL"], "disjuncts":["hWa+ RW+","AN+","@AN- AN+","@AN- Wa-","RW-"], "disjunctCost":2.2200000025331974, "linkageCost":5.0, "numViolations":0, "links":[{"label":"RW","left":0,"right":4,"leftLabel":"RW","rightLabel":"RW"},{"label":"Wa","left":0,"right":3,"leftLabel":"hWa","rightLabel":"Wa"},{"label":"AN","left":2,"right":3,"leftLabel":"AN","rightLabel":"AN"},{"label":"AN","left":1,"right":2,"leftLabel":"AN","rightLabel":"AN"}]},{"words":["LEFT-WALL","Jim.m","plays.n","guitar.s","RIGHT-WALL"], "disjuncts":["hWa+ RW+","AN+","@AN- AN+","@AN- Wa-","RW-"], "disjunctCost":2.2500000037252903, "linkageCost":5.0, "numViolations":0, "links":[{"label":"RW","left":0,"right":4,"leftLabel":"RW","rightLabel":"RW"},{"label":"Wa","left":0,"right":3,"leftLabel":"hWa","rightLabel":"Wa"},{"label":"AN","left":2,"right":3,"leftLabel":"AN","rightLabel":"AN"},{"label":"AN","left":1,"right":2,"leftLabel":"AN","rightLabel":"AN"}]},{"words":["LEFT-WALL","Jim.m","plays.v","guitar.n-u","RIGHT-WALL"], "disjuncts":["hWd+ RW+","Wd- Ss*s+","Ss- O+","Ou-","RW-"], "disjunctCost":3.0, "linkageCost":3.0, "numViolations":0, "links":[{"label":"RW","left":0,"right":4,"leftLabel":"RW","rightLabel":"RW"},{"label":"Wd","left":0,"right":1,"leftLabel":"hWd","rightLabel":"Wd"},{"label":"Ss*s","left":1,"right":2,"leftLabel":"Ss*s","rightLabel":"Ss"},{"label":"Ou","left":2,"right":3,"leftLabel":"O","rightLabel":"Ou"}]}],"version":"link-grammar-5.8.0","dictVersion":"5.8.1"}
Connection closed by foreign host.


More info than you can shake a stick at.

--linas
--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
 

Linas Vepstas

unread,
Feb 4, 2021, 3:27:16 PMFeb 4
to link-grammar
On Thu, Feb 4, 2021 at 1:51 PM Arthur Wolf <wolf....@gmail.com> wrote:


On Thu, Feb 4, 2021 at 8:46 PM Linas Vepstas <linasv...@gmail.com> wrote:


On Wed, Feb 3, 2021 at 6:30 AM Bruce Clay <bcla...@gmail.com> wrote:
Has there been any thought to saving the grammar results in a machine readable format such as xml or JSON?

Yes. LG comes with a json server built-in.

Oh, man that makes me so sad, I paddled completely uselessly coding my own after-the-fact parsing :/
Well, I'll know next time, but yes this should probably be more prominently documented...

Oh I'm so sorry!  I sometimes get the impression that there is "too much documentation", people get overwhelmed by the flood of details, and miss things like this :-)

It seems unlikely you'll need this, but I should mention that link-grammar is built into the OpenCog AtomSpace. You can just say `(cog-execute! (LgParse (Phrase "Jim plays guitar") (LgDict "en")))` and the parsed sentence will materialize inside of the AtomSpace.  You can then walk the graph edges however you may wish and extract whatever graph info you might want.  Now, perhaps you might complain that the AtomSpace does not support jQuery or tinkerpop or grakn.ai, but that is a different conversation :-)

Let me quickly demo RelEx. Now, as discussed in another email chain, relex is inadequate for "serious" linguistics work, but for casual projects, it is ... fun. Below is a cut-n-paste sample output (you can get this as json, too)  Note that it has POS-tags, and all those other things that Anton Kolonin mused about in his earlier email. (Hi Anton!)

; SENTENCE: [Jim plays guitar.]
Jim plays guitar.

====

Parse 1 of 2


    +--------------Xp--------------+
    +----->WV----->+               |
    +-->Wd--+-Ss*s-+----Ou---+     +--RW--+
    |       |      |         |     |      |
LEFT-WALL Jim.m plays.v guitar.n-u . RIGHT-WALL


Parse confidence: 0.9880
cost vector = (UNUSED=0.0 DIS=0.0 LEN=4.0)

======

Dependency relations:

    _obj(play, guitar)
    _subj(play, Jim)

Attributes:

    pos(play, verb)
    tense(play, present)
    pos(., punctuation)
    noun_number(guitar, uncountable)
    pos(guitar, noun)
    noun_number(Jim, singular)
    definite-FLAG(Jim, T)
    gender(Jim, masculine)
    pos(Jim, noun)
    person-FLAG(Jim, T)



Note that the verb has been normalized. It even knows that Jim is a guy!   There is also a server for this, which also dumps the above into opencog.  Looks like this:  (I find this to be more readable than json, but perhaps that is just me)  The 256-bit UUID's are used to disambiguate a specific instance of the word "Jim" that might appear in thousands of other sentences. 

(ReferenceLink (stv 1.0 1.0)
   (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
   (WordNode "Jim")
)
(WordInstanceLink (stv 1.0 1.0)
   (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
   (ParseNode "sentence@2447aacb-3ef3-4096-a676-8ffeb03cf17e_parse_0")
)
(WordSequenceLink (stv 1.0 1.0)
     (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
     (NumberNode "2")
)
(EvaluationLink (stv 1.0 1.0)
   (LinkGrammarRelationshipNode "Ss*s")
   (ListLink
      (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
      (WordInstanceNode "plays.v@60d8db84-a741-485c-a757-0ec085ca0f48")
   )
)
; _subj (<<play>>, <<Jim>>) -- this is a comment card, for debugging.
(EvaluationLink (stv 1.0 1.0)
   (DefinedLinguisticRelationshipNode "_subj")
   (ListLink
      (WordInstanceNode "plays.v@60d8db84-a741-485c-a757-0ec085ca0f48")
      (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
   )
)

; pos (play, verb)
(PartOfSpeechLink (stv 1.0 1.0)
   (WordInstanceNode "plays.v@60d8db84-a741-485c-a757-0ec085ca0f48")
   (DefinedLinguisticConceptNode "verb")
)
; tense (play, present)
(TenseLink (stv 1.0 1.0)
   (WordInstanceNode "plays.v@60d8db84-a741-485c-a757-0ec085ca0f48")
   (DefinedLinguisticConceptNode "present")
)
; noun_number (Jim, singular)
(InheritanceLink (stv 1.0 1.0)
   (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
   (DefinedLinguisticConceptNode "singular")
)
; definite-FLAG (Jim, T)
(InheritanceLink (stv 1.0 1.0)
   (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
   (DefinedLinguisticConceptNode "definite")
)
; gender (Jim, masculine)
(InheritanceLink (stv 1.0 1.0)
   (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
   (DefinedLinguisticConceptNode "masculine")
)
; pos (Jim, noun)
(PartOfSpeechLink (stv 1.0 1.0)
   (WordInstanceNode "Jim.m@5222d878-96c6-47ff-a6b2-fb45b93d0826")
   (DefinedLinguisticConceptNode "noun")
)

-- Linas





Reply all
Reply to author
Forward
0 new messages