Hi Alex,
> AntClaws requires prior installation of CLAWS from UCREL, is that right? But
> this is a paying service, as far as I know. Is there any way to PoS tag a
> corpus (about 800K words) free for easy use with AntConc? (Which is what
> makes AntConc so wonderful for teachers and students as well as corpus
> linguists -- free and easy as well as high quality!)
You are correct that AntClaws requires the paid CLAWS engine from
UCREL. If you do not have this, you might try using GoTagger, which is
a free and very simple tagging tool that I think is based on the Brill
tagger:
Unfortunately, the site link seems to be broken at the moment. I can
send you the program if you want.
> Related to this, I've used Yasumasa Someya's lemma list successfully before
> with Antconc, but how would it work in conjunction with a tagged corpus?
Someya's lemma list has some odd cases that need to be processed
carefully to be used successfully in AntConc. In particular, it
includes words with apostrophes and hyphens that in the default
setting of AntConc will be split in an inappropriate way. On my site
is an edited version of his lemma list with the problematic items
removed.
Using Someya's lemma list with a tagged corpus would be problematic
(unless, of course, you simple ignore all the tags and treat the
corpus as a plain text corpus - via the AntConc global settings). What
you would need to do is tag the words in the lemma list with tags that
match those in the corpus. Then, AntConc would work fine.
What I would recommend is that you create a completely new lemma list
with the tag information incorporated from the beginning. Actually,
I'm going to be making this as part of a different project very
shortly utilizing the lemma information in the BNC. I'll upload it to
my site when it's finished. (Still, the tags will have to match your
corpus tags.)
> May your 2013 continue as unmayanly as 2012
> alex
Happy new year to you, too!
(I'll see you at AACL 2013 - I'm presenting immediately after you!)
Laurence.