I have some questions about importing CLAN outputs to Excel files.
With the help of the CLAN manual, I could learn how to use INSERT, FREQ and
STATFREQ, and the switches +d2 and +d3, in order to import outputs on word
frequencies and/or types, tokens, and type-token ratios to an Excel file.
In section 8.22 STATFREQ it reads that it is also possible to produce code
frequencies, but I don’t seem to be able to find more details about it in
the manual. So, at first, I have tried a COMBO command like that:
combo +s"*|S-DA:T-*" +t%cod +d3 +f +t@ID="*target_child*" @
(where S-DA:T is the code combination searched for: S for subject, DA:T for
bare demonstrative, and S-DA:T together for bare demonstrative in subject
position)
CLAN generated ins.cmb.cex files, but not a stat.out.cex file, as this is
the case with FREQ. As it seems, the +d2 or +d3 switch works only with
FREQ? So, I used the same command line above, but with the command FREQ,
and then STATFREQ, and I can import the stat.out.sat.cex file to Excel,
everything as expected, but, of course, what I have is a table with columns
labelled 'types', 'tokens', and 'TTR'. Of course, in the process of
converting the file into Excel and also later, I can make changes in the
table such that it corresponds to my purposes, but I was wondering whether
I’m missing some switch or something that tells CLAN to generate in the
stat.out.sat.cex a column labelled with the searched string (in the case
above, *|S-DA:T-*). Or should I perhaps use a command other than FREQ?
I was also wondering whether it is possible to have a stat.out.sat.cex file
such that it summarises a series of search operations. For instance, if I
want to make a pivot table with children’s frequencies of several
linguistic units, such as subjects (*|S-*), objects (*|O-*), bare
demonstratives (*|*-DA:T-*), personal pronouns (*|*-PRO:PP-*), bare
demonstratives in subject (*|S-DA:T-*) and object position (*|O-DA:T-*),
personal pronouns in subject (*|S-PRO:PP-*) and object position
(*|O-PRO:PP-*), etc.: Is there a procedure for getting one only output file
including all of these frequencies, e.g.:
… speaker … *|S-* *|O-* *|*-DA:T-* *|*-PRO:PP-*
*|S-DA:T-* *|O-DA:T-* *|S-PRO:PP-* *|O-PRO:PP-* …
… 001 … n n n n n
n n n
… 002 … n n n n n
n n n
…
Finally, when using the +d2 switch, the stat.out.sat.cex (and the Excel
file) has a column between ‘situation’ and the first word of the
concordance. It is the 10th column – what does this column mean?
I’d be most grateful for help in clarifying these questions and I
apologise in advance for asking questions about basics the answers to which
are perhaps easily found in the manual.
Best wishes,
Susanna
--
*****************************************************************
Susanna Bartsch
bar...@zas.gwz-berlin.de
http://www.zas.gwz-berlin.de/mitarb/homepage/bartsch
Zentrum fuer Allgemeine Sprachwissenschaft (ZAS)
Centre for General Linguistics
Schuetzenstr. 18
10117 Berlin
Germany
Tel. +49 (0)30 20 192 503
Fax +49 (0)30 20 192 402
*****************************************************************
* * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * * * * * * * *
Avira MailGate has processed a mail addressed to you, which contained no known
potential malicious software.
In case you notice abnormal behavior of your software after opening the
mail or one of its attachments, please forward the complete mail to
Avira GmbH <mailto:sup...@avira.com> so it can be
checked for unknown new potential malicious software.
--
Avira MailGate
Copyright (c) 2008 by Avira GmbH.
All rights reserved.
For more information see http://www.avira.com/
Try this command:
freq +d2 +t@ID="*target_child*" -t* +t%cod +s"%|S-%" +s"%|O-%"
+s"%|%-DA:T-%" +s"%|%-PRO:PP-%" +s"%|S-DA:T-%" +s"%|O-DA:T-%"
+s"%|S-PRO:PP-%" +s"%|O-PRO:PP-%"
Notice I have replaced all the '*' characters with '%' character. The above
example should be on one command line. If this is too much, then you can use
the file "codes.cut" that I am attaching to this email with this command:
freq +d2 +t@ID="*target_child*" -t* +t%cod +...@codes.cut
If this doesn't help you, then please send me a sample of your data file and
further description of what exactly didn't work for you.
In my tests I did not see an extra column between Œsituation¹ and the
first word of the concordance. Perhaps the @ID header tiers in your data
files have an extra element at the end. You can see the correct output by
looking at "sample.cha" and running commands:
freq +d2 +s"pro:%|%" +s"pro|%" sample.cha +t@ID="*mother*" -t* +t%mor
statfreq stat.out.cex +f +d
on it. The "sample.cha" file located in clan/lib/sample/ folder.
Leonid.
> Š speaker Š *|S-* *|O-* *|*-DA:T-* *|*-PRO:PP-*
> *|S-DA:T-* *|O-DA:T-* *|S-PRO:PP-* *|O-PRO:PP-* Š
> Š 001 Š n n n n n
> n n n
> Š 002 Š n n n n n
> n n n
> Š
>
> Finally, when using the +d2 switch, the stat.out.sat.cex (and the Excel
> file) has a column between Œsituation¹ and the first word of the
I suspect PHONFREQ doesn't work anymore : I have installed the latest
version of CLAN on my windows NT computer and launching this command even on
a very simple file produces a windows error message causing clan to close...
Any suggestion?
Thanks,
Florence
*********************************
Florence CHENU
Laboratoire Dynamique du Langage
Institut des Sciences de l'Homme
14 avenue Berthelot
F-69363 LYON Cedex 07
Tel: +33 4 72 72 65 25
Fax: +33 4 72 72 65 90