Using stoplist with tagged data

26 views
Skip to first unread message

Rudy Loock

unread,
Jun 3, 2024, 9:19:14 AMJun 3
to AntConc-Discussion
Dear Laurence,

I hope you are well.
I have been trying to combine the use of tagged data (with TagAnt, for both POS and lemmas) with the use of a stoplist, and this does not seem to work. The stoplist independently works fine on non-tagged data, the tagged data can also be used in AntConc without any problem, but if I try to apply a stoplist on tagged data, it seems to have no effect. Am I doing something wrong or is this an impossible task to do?

Best,
Rudy

Laurence Anthony

unread,
Jun 3, 2024, 12:40:49 PMJun 3
to ant...@googlegroups.com
Hi Rudy,

I just tried to use a stoplist here, and it worked without any problems. Can you try loading the demo corpus, which is tagged, and then applying the attached stop list. You should find that the word list tool removes the words in the stop list as expected.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/72027c84-65f2-4f45-8c59-3f90961fcaban%40googlegroups.com.
stop_list.txt

Rudy Loock

unread,
Jun 4, 2024, 3:28:29 AMJun 4
to ant...@googlegroups.com
Dear Laurence,
Thank you very much for your help. I did as you suggest, and this works fine. However, it does not work if I want to display Headword + [Type] in the Word tab. The words in the stoplist are back. Below is a screenshot with the demo corpus and the stoplist file you sent me.  As you can see, 'the' is listed, although it is in the stoplist.
By the way what I am trying to do (so that you get the general picture) is a lemmatized glossary that does not show words in a stoplist.

image.png
Best regards,

Rudy

You received this message because you are subscribed to a topic in the Google Groups "AntConc-Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antconc/ula1x-FL4SY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/CAL6Fgv1dHPDGLtY%3D6MpxcY7v69q9QKtB%2BXhRz7usNVWSCoD8TA%40mail.gmail.com.

Laurence Anthony

unread,
Jun 5, 2024, 4:05:12 AMJun 5
to ant...@googlegroups.com
Hi Rudy,

When you choose the Headword+[Type] option, if you look at the "Type" column, you'll see that it groups all the family members of the headword into a special "type" entry. The 'the' stop word filter, for example, is trying to act on this and not seeing a match. My guess is that if your stop list included "the(1630)", you'd see that entry filtered out.

It's basically a limitation with the way that the filter words. What exactly are you wanting to find out, e.g. "all the lemma headwords (but not including headwords in the stop list")?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Rudy Loock

unread,
Jun 5, 2024, 5:21:06 AMJun 5
to ant...@googlegroups.com
Dear Laurence,
Thank you. I tried with "the(1630)" in the stoplist as you said, and it does get rid of the in the WordList then.
Basically what I am trying to do is to compile a glossary from a specialized corpus. So I want to get rid of grammatical words (hence the stoplist) and I want the glossary to be lemmatized to avoid having "bitcoin" and "bitcoins" listed as 2 separate words (hence the Headword+Type option). If I understand your message correctly, this is not feasible, which means the WordList needs to be dealt with manually in Excel for instance to merge examples like "bitcoin"/"bitcoins".

Thank you for your help!

Best regards,

Rudy

Laurence Anthony

unread,
Jun 5, 2024, 5:31:43 AMJun 5
to ant...@googlegroups.com
Hi Rudy,

Why not use the "Type + Headword" option? You'll then get a list of filtered types (with the stop list working correctly), and then you can take the results list into Excel and simply use the "Remove Duplicates" feature on the Headword column to leave you with a clean lemma list.

Would that work?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Rudy Loock

unread,
Jun 5, 2024, 5:41:08 AMJun 5
to ant...@googlegroups.com
Yes, I could do that. I was just trying to have everything done in AntConc directly ;o) I just wanted to make sure I had not missed something. So thank you very much for your help.
By the way, you might be interested in what I am currently doing: from a specialised corpus, I am working with the students on using different tools for terminological extraction with the aim of compiling a glossary on a specialized topic. So we are comparing the use of AntConc with Sketch Engine's OneClick Terms, memoQ's LiveDocs feature, and ChatGPT. The students are meant to compare the results and give the pros and cons of the use of each tool for the same task.

Best regards, and looking forward to TALC in Manchester next month!

Rudy

Reply all
Reply to author
Forward
0 new messages