list of tags used by MOR

64 views
Skip to first unread message

Rui Huang

unread,
Apr 24, 2014, 6:08:03 PM4/24/14
to chib...@googlegroups.com
Hello everyone, 

  I have a question that Erin has asked before, but did not get reply. (https://groups.google.com/forum/#!searchin/chibolts/erin/chibolts/5N8m43WrCZs/jSkHAm6aSY8J)
  Is there a comprehensive list of tags used by MOR?  How can I tell CLAN to give me a list that all speech tags it used in a certain file?

Thank you.
Rui


Leonid Spektor

unread,
Apr 24, 2014, 7:49:02 PM4/24/14
to chib...@googlegroups.com
Hi Rui,

You can try command:

freq +s@|*,o% filenames

If you are interested in all tags that a particular MOR grammar has, then you need to download a grammar you are interested in, set CLAN's "working" directory to "<grammar name>/lex" folder. For English grammar that would be "eng/lex" and run command:

freq -y +s"[scat *]" +u *.cut


Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/407a8ece-ff8d-4603-a934-63b32abc9e01%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Miyata Susanne

unread,
Apr 24, 2014, 10:48:41 PM4/24/14
to chib...@googlegroups.com
Hi Rui, Hi Leonid,
the attached list of tags is based on a list in an earlier version of the CHILDES manual. 
Maybe this is what you are looking for.
Susanne
00morcats.cut

Rui Huang

unread,
Apr 29, 2014, 12:01:28 PM4/29/14
to chib...@googlegroups.com
Hi Leonid,

Thank you for answering my question. The first command pulls out all tags in a file. It works very well. 
But the second command does not pull out all the tags that eng MOR grammar has. And this is what I need to find.
For example, in Valian corpus:

*MOT: Child's !
%mor: n:prop|Child~poss|s !
%gra: 1|2|MOD 2|0|ROOT 3|2|PUNCT

The tag 'n:prop' should appear in the output, but it did not. Hope you can take a look at it.
Thank you again!

Rui

Rui Huang

unread,
Apr 29, 2014, 12:08:29 PM4/29/14
to chib...@googlegroups.com
Hi Susanne,
Thank you for sharing this file. This is a part of what I need to find out. The whole thing I am looking for is tag set that used by MOR grammar.
For instance:

*MOT: Child's !
%mor: n:prop|Child~poss|s !
%gra: 1|2|MOD 2|0|ROOT 3|2|PUNCT

Things like 'n:prop' and 'poss' in the English MOR grammar.

Regards,
Rui

Leonid Spektor

unread,
Apr 29, 2014, 5:46:37 PM4/29/14
to chib...@googlegroups.com
Hi Rui,

"n:prop" is the ONLY tag in all MOR grammars that is hardwired into CLAN itself. If MOR see a capitalized word, then it tags it with "n:prop". There is also a way in CHAT to specify any tag on speaker tier too. For example, tier:

*CHI: word$foo .

will result in %mor tier:

%mor: foo|word .

Beside above exceptions the command "freq -y +s"[scat *]" +u *.cut" will lists all the tags.

Leonid.

Rui Huang

unread,
May 6, 2014, 11:41:49 AM5/6/14
to chib...@googlegroups.com
Hi Leonid,

I am sorry to say that the command 'freq -y +s"[scat *]" +u *.cut' does not list all the tags. I ran 'freq +s@|*,o% filenames' on Eve and Valian corpus, and then collected all the tags that appear in the output, I found some tags are not in the  'freq -y +s"[scat *]" +u *.cut' output. (Attached is my output. Tags appear in orange color is not in the  'freq -y +s"[scat *]" +u *.cut' output.)

  In addition, I do not know the meaning of some tags that links by '+' symbol, like ''adv|+adj+n', 'adj+adj+adj', 'n+n+n', 'n+n+adj', and so on. Do you know them?
Thank you. 
tagsinEve&Valian.rtf

Rui Huang

unread,
May 6, 2014, 12:01:08 PM5/6/14
to chib...@googlegroups.com
I just found that tags linked by '+' symbol are compound words.

Leonid Spektor

unread,
May 6, 2014, 1:08:32 PM5/6/14
to chib...@googlegroups.com
Rui,

I forgot to mention one more exception for tags. Any word that ends with a "@..." symbol is converted to MOR tag using "sf.cut" file located in grammar's root folder. For English example, this file will be in eng folder.

Tags that are separated by '+' characters are compound words. Those are words that are made up of two or more different words to create a new word. For example, in English word "hopscotch" consist of two words "hop" and "scotch" and it is listed in lexicon file "n+v+n.cut". MOR command tags this word as " n|+v|hop+n|scotch" the first tag is a noun, "n|", and is an overall tag for the word, The second tag "v|" indicates that word "hop" is a verb and the third tag "n|" indicates that word "scotch" is a noun. Thus in FREQ output you would see "n|+v+n" tag. The other compound tags are more complicated. For example, word "iceskating" consists on two words "ice" and "skating". "ice" is a noun, "n|", and  word "skating" consists of parts "skate" and "ing", i.e. "n|" and "n:gerund|". Thus resulting tag for word "iceskating" is "n:gerund|+n+n". Compound words can either be literally listed in lex file like word "hopscotch" or can consist of tags representing its components like word "iceskating".

So, all the tags in MOR come from lex files, sf.cut file and $part-of-speech tag on main speaker tier. But, how those tags are arranged together in the end is a function of MOR command.

For more information I strongly encourage you to read chapter "11 MOR – Morphosyntactic Analysis" in CLAN's manual located at URL:


Leonid.


For more options, visit https://groups.google.com/d/optout.
<tagsinEve&Valian.rtf>

Reply all
Reply to author
Forward
0 new messages