Searching for collocations with xml files

215 views
Skip to first unread message

Andrew Drummond

unread,
Dec 1, 2015, 12:39:06 AM12/1/15
to AntConc-discussion
Hi Laurence,

Thanks for antconc, tagant and all the tuition videos. I'm really a beginner at this so I really may have missed something obvious. Here's the query....

With a txt file tagged with tagant I can search for strings like 'this *_NN' to bring up collocations like 'this study' and 'this analysis'. It works like a dream. Tagant tagged over 6 million words for me from bnc. Amazing!

When a look at the raw text of an xml file, there are too many html type tags (as opposed to parts-of-speech tags) for me to see how to search for similar collocations.

Do I need to make a list of all the tags in the xml doc that provide non-linguistic information in order to produce a similar list of (this + noun) collocations? Or is there a way of instructing antconc only to read 'words' and pos tags?

Any help much appreciated!

Kind regards, Andrew

Laurence Anthony

unread,
Jan 17, 2016, 2:56:46 AM1/17/16
to ant...@googlegroups.com
Hi Andrew,

I seemed to have missed this post from you. Sorry!

From what you say, you have tagged the XML file with the POS tagger, while still keeping the original XML tags. I assume then that these have also been tagged. A much easier way to proceed is to first remove all the XML tags from your data prior to tagging the corpus with TagAnt. If you do this, then the processed files can be used within AntConc very easily, as I explain in my YouTube video on working with tagged data.

(Note that AntConc itself can generate the XML files with the tags hidden for the first part of this process).

However, if you want to keep the XML tags in place (e.g. because you want to search for these, too), then I'm afraid that AntConc will not work so well. You might try using one of the hide tag options in the tag settings menu, but I'm not sure what the result would be.

Perhaps you can explore the settings and report back here what you found. I expect others would be interested, too.

Laurence.


--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at http://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages