Doubt regarding tagging on AntConc

872 views
Skip to first unread message

Sabrinamina

unread,
Apr 13, 2011, 8:13:54 AM4/13/11
to AntConc-discussion
I am an MA student at the University of Birmingham and I am
researching the use of Phrasal Verbs by Brazilian Learners of English
through the combination of a learner corpus called CoMAprend (from the
University of São Paulo) and your wonderful tool called Antconc.
I have tried finding info on how to use tags in Antconc, but I
haven't been able to figure it out. I'd like to use a tag to look for
the occurrence of Verb + Adverbial Verb Particles (Phrasal Verbs), and
I was wondering if you could help me with a tip or two on how to do
that. Is the tagging system the same as the one used in the BNC?

Thank you! Looking forward to your reply!

Best regards,

Sabrina B. Fadanelli

Caxias do Sul - RS - Brazil

Laurence Anthony

unread,
Apr 13, 2011, 8:19:59 AM4/13/11
to ant...@googlegroups.com, Sabrinamina
Hi Sabrina,

AntConc doesn't do the tagging that you need. You will need to tag
your corpus with a tagger tool. (Google "POS tagger" for some
suggestions).
Also see http://en.wikipedia.org/wiki/Part-of-speech_tagging

I have used the freeware Qtag with success (but you'll need to install
the JAVA runtime to run it):
http://phrasys.net/uob/om/software

After you tag your corpus, AntConc can be used to search for the data
that is tagged. The tag settings are in the global settings menu.

I hope that helps.
Laurence.


###############################################################
Laurence Anthony, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

> --
> You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
> To post to this group, send email to ant...@googlegroups.com.
> To unsubscribe from this group, send email to antconc+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/antconc?hl=en.
>
>

Sabrina Mina

unread,
Apr 14, 2011, 8:48:21 AM4/14/11
to ant...@googlegroups.com
Thank you very much! It was really helpful!
 
Best regards,
 
Sabrina Fadanelli

--- On Wed, 4/13/11, Laurence Anthony <antho...@gmail.com> wrote:
> To unsubscribe from this group, send email to antconc+unsub...@googlegroups.com.

> For more options, visit this group at http://groups.google.com/group/antconc?hl=en.
>
>

--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To post to this group, send email to ant...@googlegroups.com.
To unsubscribe from this group, send email to antconc+unsub...@googlegroups.com.

Doug Huffer

unread,
May 9, 2011, 10:18:34 PM5/9/11
to AntConc-discussion
I'm using Qtag with XML tagging. That gives me tags like this:

<w pos='OD'>Last</w>

But when I do a word list, it counts the "w" and "pos" values. Should
I change the qtag output? Or is there something in AntConc to
change?

Also, I want to do a word list and n-grams of only the tags. It seems
like reversing the tag settings (i.e. '<' to '>' and '>' to '<') works
(although I still get the "w" and "pos" values). Is there another
solution?


Much thanks,
Doug


On Apr 13, 9:19 pm, Laurence Anthony <anthony0...@gmail.com> wrote:
> Hi Sabrina,
>
> AntConc doesn't do the tagging that you need. You will need to tag
> your corpus with a tagger tool. (Google "POS tagger" for some
> suggestions).
> Also seehttp://en.wikipedia.org/wiki/Part-of-speech_tagging
>
> I have used the freeware Qtag with success (but you'll need to install
> the JAVA runtime to run it):http://phrasys.net/uob/om/software
>
> After you tag your corpus, AntConc can be used to search for the data
> that is tagged. The tag settings are in the global settings menu.
>
> I hope that helps.
> Laurence.
>
> ###############################################################
> Laurence Anthony, Ph.D.
> Professor
> Center for English Language Education in Science and Engineering (CELESE)
> Faculty of Science and Engineering
> Waseda University
> 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
> E-mail: anthony0...@gmail.com

Laurence Anthony

unread,
May 15, 2011, 10:24:05 PM5/15/11
to ant...@googlegroups.com
Hi Doug,

Sorry for the delay.


>I'm using Qtag with XML tagging.  That gives me tags like this:
<w pos='OD'>Last</w>
>But when I do a word list, it counts the "w" and "pos" values.  Should
>I change the qtag output?  Or is there something in AntConc to
>change?

In the Global settings of AntConc, you can adjust the tag settings to hide tags of this format. If you do this, you can then get a word list of just the tagged words, and not the tags themselves.


>Also, I want to do a word list and n-grams of only the tags.  It seems
>like reversing the tag settings (i.e. '<' to '>' and '>' to '<') works
>(although I still get the "w" and "pos" values).  Is there another
>solution?

Hmm, I haven't thought about this. In the word list preferences, you can set a filter to only count particular words. So, if you set up the token definition (in global settings) to treat the tag as a word, then you can specify the complete tag set as words in the filter and AntConc will then count them.

I would imagine that the QTag underscore tag option would probably be easier to work with. By default underscores are not treated as word tokens in AntConc but the tags themselves are just normal tokens. So, if you add the complete tag set to the word list filter, and assuming they don't overlap with regular "words", then you can get the count that you need.

Doug Huffer

unread,
May 18, 2011, 8:51:48 AM5/18/11
to AntConc-discussion


On May 16, 11:24 am, Laurence Anthony <anthony0...@gmail.com> wrote:
> Hi Doug,
>
> Sorry for the delay.

No problem. Any and all assistance is much appreciated.

> I would imagine that the QTag underscore tag option would probably be easier
> to work with. By default underscores are not treated as word tokens in
> AntConc but the tags themselves are just normal tokens. So, if you add the
> complete tag set to the word list filter, and assuming they don't overlap
> with regular "words", then you can get the count that you need.

The tag set includes BE, DO, and TO, so there is some overlap.

I'm just stripping out the "w pos=" and "/w" parts of the XML tag,
reversing the tag settings and I can run POS n-grams. Maybe not the
most elegant solution, but it works.


cheers,
Doug

Laurence Anthony

unread,
May 18, 2011, 9:32:15 AM5/18/11
to ant...@googlegroups.com
Hi again,

Why not just specify the entire tag in the word filter? e.g.

<w pos=xxx>
<w pos=yyy>
...

or some variation of it, e.g.
pos=xxx
pos=yyy
...

I would doubt these overlap with normal words.

It seems a more reliable solution, assuming you know the entire tag set.

Laurence.

Doug Huffer

unread,
May 19, 2011, 11:01:46 AM5/19/11
to AntConc-discussion
Laurence -

I must be doing something wrong, and I can't figure it out.

I've created a file with the word list:

<w pos='BE'>
<w pos='BED'>
<w pos='BEDZ'>
<w pos='BEG'>
<w pos='BEM'>
<w pos='BEN'>

When I load it into AntConc, it just comes out as

BE
BED
BEDZ
BEG
BEM
BEN

Which ends up double counting words.

No need to spend much time trying it to explain to me. The manual
stripping of the tags is fairly easy.

cheers,
Doug

Laurence Anthony

unread,
May 19, 2011, 11:11:31 AM5/19/11
to ant...@googlegroups.com
Hi,

> I must be doing something wrong, and I can't figure it out.
>
> I've created a file with the word list:
>
> <w pos='BE'>
> <w pos='BED'>
> <w pos='BEDZ'>
> <w pos='BEG'>
> <w pos='BEM'>
> <w pos='BEN'>
>
> When I load it into AntConc, it just comes out as
>
> BE
> BED
> BEDZ
> BEG
> BEM
> BEN

The key to counting just the tags is to make sure the token definition
will include them. So, the token definition needs to include:
p, o, s, B, E, D, Z, G, M, N > which it already will if you use the
"user definition" option.

Then you need to add:
=, ' (equals sign and apostrophe)

Then "words" like: pos='BE' will be counted like any other word.

Laurence.

Reply all
Reply to author
Forward
0 new messages