Differences in MOR output using macOS and Linux

49 views
Skip to first unread message

Leandro Garber

unread,
Dec 28, 2021, 11:47:16 AM12/28/21
to chibolts
Hi everyone,

We are processing some CHA files with MOR and we are finding differences between the same files processed on Linux and on MacOS.

Both using the latest version of MOR for english downloaded from https://talkbank.org/morgrams/

Some examples:
-------------------
*FA1:        well it ended up there in your pile

MacOS:
%mor:        co|well pro:per|it v|end-PAST prep|up n|there prep|in det:poss|your
        n|pile .
Linux:
%mor:        co|well pro:per|it v|end-PAST adv|up adv|there prep|in
        det:poss|your n|pile .
------------------------------
*MA1:        Morgan doesn't bite .

MacOS:
%mor:        n:prop|Morgan mod|do&3S~neg|not v|bite .
Linux:
%mor:        n:prop|Morgan mod|do&3S~neg|not n|bite .
-------------------------------

We are using CLAN from the UI in macOS and on Linux we are using:
mor +L/path/to/lib/eng [chaFilePath]
post +d/path/to/lib/eng/post.db [chaFilePath]
postmortem +L/path/to/lib/eng $1
megrasp +L/path/to/lib/eng $1

We found that on Linux post, postmortem and megrasp don't automatically run so we manually do it.

Am I missing something ? Shouldn't the output be the same ?

Thanks in advance,
Leandro.

Brian Macwhinney

unread,
Dec 28, 2021, 11:57:54 AM12/28/21
to ChiBolts
Dear Leandro,
It appears that on Linux, you are not running the PREPOST program, which is a step in the MOR-PREPOST-POST-POSTMORTEM-MEGRASP chain on Mac. For your two examples, I think that the Mac version is better. So, I would recommend either relying on that or else running PREPOST on Linux also.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology,
Language Technologies and Modern Languages, CMU
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/42301bf1-b37f-42bd-b142-14b0ab00555en%40googlegroups.com.

Leandro Garber

unread,
Dec 28, 2021, 12:34:01 PM12/28/21
to chib...@googlegroups.com
Dear Brian,

Thanks for your quick response. I can't find a "prepost" binary in my bin folder, is it named something different ?

Best Regards,
Leandro.


You received this message because you are subscribed to a topic in the Google Groups "chibolts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/chibolts/7e0ZMlz-T5Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/67331DBC-DE32-4D1F-8DF8-50733431B0BD%40andrew.cmu.edu.

Leandro Garber

unread,
Dec 28, 2021, 12:57:26 PM12/28/21
to chib...@googlegroups.com
Dear Brian,

I'm reading the manual and now I understand there is no prepost binary.

I could find a prepost.cut file that I understand is a set of rules that should be run before POST. I'm trying to learn how to run these rules. Any help would be much appreciated.

Best regards,
Leandro.

Brian Macwhinney

unread,
Dec 28, 2021, 1:17:56 PM12/28/21
to ChiBolts
I guess Leonid didn’t include PREPOST in the Unix distribution. We are all on vacation this week and next. Any chance you could rely on Mac OS instead?

— Brian
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CANsQED9kQV-p4nVJz4L%2BPjbkj6%2BT45-8waKj9%2BNsSzx935krvw%40mail.gmail.com.

Leonid Spektor

unread,
Dec 28, 2021, 1:21:42 PM12/28/21
to chib...@googlegroups.com
Apparently there is a bug in Unix MOR code that fails to read pre-post rules file. I am working on this right now and I will let everyone know when it is fixed.


Leonid.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/02A45247-056C-4B66-9003-9C487C4468C4%40andrew.cmu.edu.

Leandro Garber

unread,
Dec 28, 2021, 1:39:36 PM12/28/21
to chib...@googlegroups.com
Thanks Brian and Leonid for your time, I hope I didn't spoil your holidays !

Best,
Leandro.


Leonid Spektor

unread,
Dec 28, 2021, 1:47:05 PM12/28/21
to chib...@googlegroups.com
The Unix bug has been fixed. New UNIX code is on the web.


Leonid.

Leandro Garber

unread,
Dec 28, 2021, 2:29:28 PM12/28/21
to chib...@googlegroups.com
That was fast.

I've just downloaded it, compiled it, and yes, it works like a charm. Files are identical now.

Thanks for your help and happy new year.


Leandro Garber

unread,
Jan 22, 2025, 12:21:39 PMJan 22
to chibolts
Hi everyone, I hope you are starting 2025 great.

Just wanted to report that I've just downloaded the latest Linux version and I'm getting different outputs on Linux compared to Windows (not Mac this time as the title suggest). I've been able to run Windows version with Wine emulator in order to do the comparison.

On linux I'm running:

/path/to/mor -L/path/to/lib/spa my_cha_file.cha
/path/to/post +d/path/to/lib/spa/post.db my_cha_file.cha
/path/to/postmortem -L/path/to/lib/spa my_cha_file.cha
/path/to/megrasp -L/path/to/lib/spa my_cha_file.cha

Am I missing something?

Thanks in advance
L.

Leonid Spektor

unread,
Jan 23, 2025, 4:22:42 PMJan 23
to chib...@googlegroups.com
Hi Leandro,

I am still trying to figure out why Mac/Windows and Unix outputs are different. I noticed that on Mac/Windows words with accents like "móntale" and striped of accents and are converted to stem "monta". The word "móntale" is not in the lexicon, so striping accents occurs before the word is looked up in the lexicon list. On Unix the accents are not striped, so MOR can't find "móntale" word anywhere in the Spanish lexicon. I did not create Spanish grammar, so it takes a long time for me to figure out where the conversion occurs.

Is this the difference in Windows and Linux you are noticing? Or can you give me an example of the difference in outputs that you are referring to in your email?

I just wanted to let you know that I am still working on this.


Leonid.

On Jan 22, 2025, at 12:21, Leandro Garber <leandr...@gmail.com> wrote:

Hi everyone, I hope you are starting 2025 great.

Just wanted to report that I've just downloaded the latest Linux version and I'm getting different outputs on Linux compared to Windows (not Mac this time as the title suggests). I've been able to run Windows version with Wine emulator in order to do the comparison.

Leandro Garber

unread,
Jan 23, 2025, 4:32:20 PMJan 23
to chib...@googlegroups.com
Hi Leonid, thank you very much for your reply.

I think you are on the right track, I recall the differences could be caused by something like that. I don't know the technical details about lexicon implementations but I can check it out as well and see if I discover something.
I can send you examples next Tuesday that I'll be on my office computer.

Right now I just remember there were problems with some diminutives (i.e perro (dog) -> perrito (little dog)) (not all)

I'm glad you have the time to work on this.
Best,
L.



Leonid Spektor

unread,
Jan 24, 2025, 5:01:10 AMJan 24
to chib...@googlegroups.com
Hi Leandro,

I have finally found the problem and fixed it. I have tested it on Mac Unix. I assume it will also work on Linux. Let me know if you have any problems with Unix CLAN. New UnixCLAN is on the web.


Leonid.

Leandro Garber

unread,
Jan 24, 2025, 9:18:15 AMJan 24
to chib...@googlegroups.com

That's great Leonid, thanks ! I'll test it on Tuesday and let you know !

Cheers


Leandro Garber

unread,
Jan 29, 2025, 9:41:17 AMJan 29
to chib...@googlegroups.com
Hi Leonid,

I've just tried the latest linux version.

I've found some differences with Windows, specifically with words cosa/coso/cosos (thing/things)

Windows almost always makes a mistake and parse cosa like v|cose-2S&SUB&PRES=sew
Linux parse it correctly: n|cosa&f-PL=thing^n|coso-f-PL=thing (though i think it should desambiguate for the second one)

Here some examples:

//////////////

*SIS: recién [=! alarga] va a poner sus cosas en su mochila !

Windows:
%mor: adv|recién=recently v|i-3S&PRES=go prepart|a=to inf|pone-INF=put
co|sus v|cose-2S&SUB&PRES=sew prep|en=in det:poss|su&3S=his
n|mochila&f=backpack !

Linux:
%mor: adv|recién=recently v|i-3S&PRES=go prepart|a=to inf|pone-INF=put
co|sus n|cosa&f-PL=thing^n|coso-f-PL=thing prep|en=in
det:poss|su&3S=his n|mochila&f=backpack !

//////////////

*MOT: dale guarda las cosa(s) !

Windows:

%mor: imp|da-2S&IMP~pro:clit|3S=give v|guarda-3S&PRES=guard
det:art|el&f-PL=the v|cose-2S&SUB&PRES=sew !

Linux:

%mor: imp|da-2S&IMP~pro:clit|3S=give v|guarda-3S&PRES=guard
det:art|el&f-PL=the n|cosa&f-PL=thing^n|coso-f-PL=thing !

///////////////

*GMO: cualquier cosa vienen para casa .

Windows:

%mor: qn|cualquier&m=whichever imp|cose-3S&IMP=sew v|veni-3P&PRES=come
prep|para=for n|casa&f=house .

Linux:

%mor: qn|cualquier&m=whichever n|cosa&f=thing^n|coso-f=thing
v|veni-3P&PRES=come prep|para=for n|casa&f=house .

///////////////

*MOT: las dos cosas mamá las dos cosas . [+ CHI]

Windows:

%mor: det:art|el&f-PL=the det:num|dos=two v|cose-2S&SUB&PRES=sew
n|mamá&f=mommy det:art|el&f-PL=the det:num|dos=two
v|cose-2S&SUB&PRES=sew .

Linux:

%mor: det:art|el&f-PL=the det:num|dos=two
n|cosa&f-PL=thing^n|coso-f-PL=thing n|mamá&f=mommy
det:art|el&f-PL=the det:num|dos=two
n|cosa&f-PL=thing^n|coso-f-PL=thing .


Leonid Spektor

unread,
Jan 29, 2025, 12:58:06 PMJan 29
to chib...@googlegroups.com, Leandro Garber
Hi Leandro,

I have created three test files from your examples in last email. I am attaching those three files to this email. I don't know if chibolts allows email attachments or not, so I am copying this email to your email account directly.

The spa-l.cha file was created by MOR command on Mac Unix. The spa-w.cha was created by MOR on Windows 11. The spa-m.cha file was created by MOR on Mac. All three of those files have identical %mor tiers and MOR created: n|cosa&f-PL=thing.

The %mor tiers were created during my test. The %xwr: tiers are from your email created by your Windows and the %xlm: tiers are from your email created by your Linux. I don't know why your Linux creates "n|cosa&f-PL=thing^n|coso-f-PL=thing". This usually happens if you do not run POST command. POST disambiguates words.

My best guess for your Windows result is that there is something wrong in the Windows system. Or Spa grammar on your Windows is not the same as on your Linux. You can get spa grammar from web page https://talkbank.org/morgrams/.

We are trying to move away from MOR command and grammar to Universal Dependencies system. On the web page https://talkbank.org/ look at section "ASR and Morphosyntax" and Batchalign2 command.


Leonid.
spa-l.cha
spa-w.cha
spa-m.cha

Leandro Garber

unread,
Jan 29, 2025, 1:27:51 PMJan 29
to Leonid Spektor, chib...@googlegroups.com
Thanks Leonid, I'll check them all out as well as the website.


Leandro Garber

unread,
Feb 5, 2025, 10:03:16 AMFeb 5
to Leonid Spektor, chib...@googlegroups.com
Hi Leonid !

My bad, I thought my Spa grammar was updated but there was a newer version. Updating it did the trick, thanks.

I also tried out batchalign morphosyntax, nice work thanks. I could make it work and it seems to have done a good job with some minor errors.

Best regards,
Leandro.


Reply all
Reply to author
Forward
0 new messages