I'm trying to run basic freq commands on a bilingual conversation marked up
with the current CLAN default (ie with precodes). What I'm trying to do is to
get figures for total number of words in each language. This would be:
eng: words marked @s:eng, and unmarked words where the precode is [- eng];
spa: unmarked words, and words marked @s:spa where the precode is [- eng];
indeterminate: words marked @s:eng&spa.
The command:
clan/unix/bin/freq -s"@s:eng" clan/chats/myfile.cha
gets the ones marked @s:eng, but also includes the ones marked @s:eng&spa.
Using:
clan/unix/bin/freq +s"@s:eng&spa" clan/chats/myfile.cha
produces no results. I assume & has to be escaped, but \& doesn't work.
Using
clan/unix/bin/freq +s"@s:eng" +s"[- eng]" clan/chats/myfile.cha
(to try and get all the English words, including the ones with precodes) also
produces no results.
I'd be grateful if someone could tell me the magic switches here. I suppose
in more general terms the question is, how far can standard regular
expressions be used in the CLAN command line - is there a special syntax, or
are they not really expected to be used there?
Thanks.
--
Pob hwyl / Best wishes
Kevin Donnelly
kevindonnelly.org.uk
The +s"@s:eng&spa" option needs a star character to match the actual word. So, the right command is +s"*@s:eng&spa".
A better command would be "freq +l myfile.cha +s@s&eng" for English words and command
"freq +l myfile.cha +s@s&spa" for Spanish words.
For more information about the +s@s option type "freq +s@s" in commands window. The +l option assigns explicit language tag to every word, thus making the use of +s"[- eng]" option unnecessary.
Leonid.
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To post to this group, send email to chib...@googlegroups.com.
> To unsubscribe from this group, send email to chibolts+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
>
>
::::On Thursday 23 June 2011 Leonid Spektor said::::
> The +s"@s:eng&spa" option needs a star character to match the actual
word.
> So, the right command is +s"*@s:eng&spa".
Great - this works fine.
> A better command would be "freq +l myfile.cha +s@s&eng" for English words
> and command "freq +l myfile.cha +s@s&spa" for Spanish words.
> For more information about the +s@s option type "freq +s@s" in commands
> window. The +l option assigns explicit language tag to every word, thus
> making the use of +s"[- eng]" option unnecessary.
This would indeed be useful, but unfortunately it doesn't work here - I get:
=====
clan/unix/bin/freq +l clan/chats/myfile.cha +s@s&spa
[1] 17700
+s@s Followed by search pattern
r word
& stem language marker
+ suffix language marker
$ part-of-speech marker
o all other elements not specified by user
followed by - or + and/or the following
* find any match
% erase any match
word -find "word"
For example:
+s"@r-*,&-it"
find all words with Italian stems
+s"@r*,&it,$n"
find all words with Italian stems and part of speech tag "n"
+s"@r-*,&-en,o-%"
find all words with English stems and erase all other markers
+s"@r*,&it,+en"
find all words with Italian stems and English suffix
No command 'spa' found, did you mean:
<snip>
=====
I've tried various permutations of +s@s&spa, but no luck. :-(
You must be using unix system. In this case the second command needs to have +s option surrounded with quotes. So, the command is:
clan/unix/bin/freq +l +s"@s&eng" clan/chats/myfile.cha
Leonid.
::::On Friday 24 June 2011 Leonid Spektor said::::
> You must be using unix system.
Of course. :-)
> In this case the second command needs to
> have +s option surrounded with quotes. So, the command is:
> clan/unix/bin/freq +l +s"@s&eng" clan/chats/myfile.cha
No, I'd already tried:
clan/unix/bin/freq +l clan/chats/myfile.cha +s"@s&eng"
and it gives me a printout of most of the lines in the .cha file. Same with
your variant above.
freq +l +s"<- spa>" *.cha
freq +l *.cha
freq +l +s"<- zho>" *.cha
freq +l +s"[- zho]" *.cha
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/e8924b92-7f2a-4066-b062-7c509352b84d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/2b392587-f72a-45c8-86ff-e5812c62d07b%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/2b392587-f72a-45c8-86ff-e5812c62d07b%40googlegroups.com.