unable to use a list of words for KWIC

14 views
Skip to first unread message

Maurizio Lana

unread,
Nov 27, 2025, 7:46:02 AMNov 27
to AntConc-Discussion
hi Laurence,
i collected a list of words which i want to use for generating the KWIC
i open the advanced search and put the words into searchquery  list, the add, then apply;
the back to KWIC, i select adv search, clic start but i get "no hits found"
what i am doing wrong?
best
Maurizio

Laurence Anthony

unread,
Nov 27, 2025, 7:47:54 AMNov 27
to ant...@googlegroups.com
Hi, 

My guess is that you have pasted the words into the interface in the wrong format. Can you first check the help page and if it still doesn't work, can you upload a screenshot of the Advanced Search box.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/30523dd0-fe1e-4466-91e0-30061f429b30n%40googlegroups.com.

maurizio lana

unread,
Nov 27, 2025, 10:34:58 AMNov 27
to ant...@googlegroups.com
i pasted (un MacOS) from textedit to the field
nevertheless i will check as you say:
i copied form textedit into sublime which recognizes it as UTF-8; copied from sublime, pasted into antconc and it worked

i am a bit perplexed that writing into and copying from textedit gave error.
also because after the error with the list of words, even the search for a single word was not working

best
Maurizio


Il 27/11/25 13:47, Laurence Anthony ha scritto:
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv0PM5ff1pW9NCYL5iNaXoV7dCxf5QF%3DfGfdKpbRr8mFuQ%40mail.gmail.com.




lo straniero non parla e non capisce la nostra lingua, 
che non è più nostra, perché la nostra vera lingua 
diventa la traduzione, lo scambio
luca ferrieri, dalla public library all’open library

Maurizio Lana
Università del Piemonte Orientale
Dipartimento di Studi Umanistici
Piazza Roma 36 - 13100 Vercelli

Laurence Anthony

unread,
Nov 27, 2025, 10:37:15 AMNov 27
to ant...@googlegroups.com
I think TextEdit on Mac is a rich text format. So, it's probably got some formatting as part of the text when you copy.

Does anybody else have any ideas why it doesn't work with TextEdit?

Laurence.




Message has been deleted

Laurence Anthony

unread,
Nov 27, 2025, 11:27:27 AMNov 27
to ant...@googlegroups.com
What a fantastic answer, Ali! Thank you!


On Fri, 28 Nov 2025 at 01:20, ali duman <alidum...@gmail.com> wrote:

Here is a clearer, more academic English version of the “What should you do?” part:


Recommended Practical Measures

  1. Enforce plain-text mode in TextEdit (if you continue using it).

    Before copying any material into AntConc, convert the document in TextEdit to plain text (e.g. via Format → Make Plain Text). This step removes hidden formatting (RTF/HTML markers) and reduces the risk of incompatible metadata being transferred together with the text.

  2. Standardise the character encoding to UTF-8.

    In TextEdit’s preferences, explicitly set the default encoding for both opening and saving files to UTF-8. Ensuring that the source file and AntConc share the same encoding eliminates many problems related to unreadable characters and failed searches.

  3. Disable “smart” typographical substitutions.

    Features such as smart quotes, smart dashes, and automatic ligatures should be turned off. These substitutions can introduce characters that differ from the basic ASCII/Unicode forms expected by corpus tools, leading to mismatches during concordance or frequency searches.

  4. Prefer a code editor for corpus preparation.

    For greater reliability, it is advisable to prepare and clean corpus files in a dedicated text/code editor such as Sublime Text or VS Code. These editors treat all content strictly as plain text, make the active encoding (UTF-8) explicit, and avoid injecting hidden formatting.

  5. Save and import as UTF-8 .txt files instead of copy–paste where possible.

    Instead of pasting directly from TextEdit into AntConc, save the corpus as a UTF-8 encoded .txt file and then load that file from within AntConc. This workflow minimises the intermediate transformations that can corrupt encoding or normalization.

  6. Adopt a consistent Unicode normalization strategy.

    If your data contain accented or non-Latin characters, consider applying a uniform Unicode normalization form (e.g. NFC) to all corpus files before analysis. Consistent normalization ensures that visually identical characters are also identical at the code-point level, which is crucial for reliable tokenisation and search.


Note: This text was prepared with the assistance of an AI tool and finalized after my own (Dr.Ali DUMAN) review and edits.


27 Kas 2025 Per, saat 18:37 tarihinde Laurence Anthony <antho...@gmail.com> şunu yazdı:

--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.

ali duman

unread,
Nov 27, 2025, 1:47:35 PMNov 27
to ant...@googlegroups.com
“Thank you very much for your kind words. I truly appreciate your feedback.

Laurence Anthony <antho...@gmail.com>, 27 Kas 2025 Per, 19:27 tarihinde şunu yazdı:

Nicholas Groom

unread,
Nov 27, 2025, 3:51:33 PMNov 27
to ant...@googlegroups.com
Hi Laurence,

TextEdit saves files as RTF by default, but you can change this default setting to Unicode UTF-8 by going to Settings > Format, and select the radio button ‘Plain text’. Once you have done this, all subsequent files will be saved as UTF-8 by default. You can also reformat existing RTF files as UTF-8 files in TextEdit by selecting Format > Make Plain Text. 

Hope this helps.

Best wishes,

Nick



Maurizio Lana

unread,
Nov 27, 2025, 6:45:05 PMNov 27
to ant...@googlegroups.com
thank you for this explanation but - 
  • my textedit was already set to use UTF-8; 
  • the text i was trying to use in antconc was a list (one word per line) of plain words made of "strict base ASCII characters", so to say.
so for me the question why antconc didn't do that search remains unsolved...

what i found is that if in adv search, search list, i input the words one by one clicking "add", that list of words is searched;
while if i try to repeat the "paste form Sublime", i get again the "no hit found".  

fact is that if one builds a list of - say - 70 words defining/describing a semantic field, 
one would prefer to have AntConc to open the file, rather than to copy/paste those words from the text file to AntConc.
is this possible?

Maurizio





Il 27/11/25 17:20, ali duman ha scritto:

Here is a clearer, more academic English version of the “What should you do?” part:


Recommended Practical Measures

  1. Enforce plain-text mode in TextEdit (if you continue using it).

    Before copying any material into AntConc, convert the document in TextEdit to plain text (e.g. via Format → Make Plain Text). This step removes hidden formatting (RTF/HTML markers) and reduces the risk of incompatible metadata being transferred together with the text.

  2. Standardise the character encoding to UTF-8.

    In TextEdit’s preferences, explicitly set the default encoding for both opening and saving files to UTF-8. Ensuring that the source file and AntConc share the same encoding eliminates many problems related to unreadable characters and failed searches.

  3. Disable “smart” typographical substitutions.

    Features such as smart quotes, smart dashes, and automatic ligatures should be turned off. These substitutions can introduce characters that differ from the basic ASCII/Unicode forms expected by corpus tools, leading to mismatches during concordance or frequency searches.

  4. Prefer a code editor for corpus preparation.

    For greater reliability, it is advisable to prepare and clean corpus files in a dedicated text/code editor such as Sublime Text or VS Code. These editors treat all content strictly as plain text, make the active encoding (UTF-8) explicit, and avoid injecting hidden formatting.

  5. Save and import as UTF-8 .txt files instead of copy–paste where possible.

    Instead of pasting directly from TextEdit into AntConc, save the corpus as a UTF-8 encoded .txt file and then load that file from within AntConc. This workflow minimises the intermediate transformations that can corrupt encoding or normalization.

  6. Adopt a consistent Unicode normalization strategy.

    If your data contain accented or non-Latin characters, consider applying a uniform Unicode normalization form (e.g. NFC) to all corpus files before analysis. Consistent normalization ensures that visually identical characters are also identical at the code-point level, which is crucial for reliable tokenisation and search.


Note: This text was prepared with the assistance of an AI tool and finalized after my own (Dr.Ali DUMAN) review and edits.


27 Kas 2025 Per, saat 18:37 tarihinde Laurence Anthony <antho...@gmail.com> şunu yazdı:
I think TextEdit on Mac is a rich text format. So, it's probably got some formatting as part of the text when you copy.


quando sono diventato padre ho capito che i genitori hanno due compiti fondamentali: 
il primo è quello di difendere il proprio figlio dalla malvagità del mondo; 
il secondo è quello di aiutarlo a riconoscerla

Maurizio Lana

Laurence Anthony

unread,
Nov 27, 2025, 6:45:19 PMNov 27
to ant...@googlegroups.com
Thanks, Nick!

Very useful information.


Laurence Anthony

unread,
Nov 27, 2025, 6:49:09 PMNov 27
to ant...@googlegroups.com
Hi Maurizio,

Yes, I could certainly add a file upload feature in the Adv. Search. Let me add it to the Beta version of 4.4.0, which I'm working on now.

Laurence.


Maurizio Lana

unread,
Dec 1, 2025, 4:23:41 AM (12 days ago) Dec 1
to ant...@googlegroups.com
hmmm...
when i paste an old ms-dos text file into Sublime it reads it as Western (Windows 1212) 

when i paste words from Textedit to Sublime it reads them as UTF-8; same reads BBEdit

this seems assess that textedit is a clean texteditor ...
Maurizio



Il 27/11/25 16:36, Laurence Anthony ha scritto:
I think TextEdit on Mac is a rich text format. So, it's probably got some formatting as part of the text when you copy.
Does anybody else have any ideas why it doesn't work with TextEdit?




per quelli che non hanno fatto in tempo a trovare un riparo
film:questi giorni 

Maurizio Lana

Laurence Anthony

unread,
Dec 1, 2025, 11:03:39 PM (11 days ago) Dec 1
to ant...@googlegroups.com
Hi Maurizo,

TextEdit is certainly not a "clean" text editor. This is revealed in the very first image on the instruction page:


As the blurb says "Open documents in many formats. Create and edit plain text, rich text (.rtfd), and HTML documents, or open and edit documents created in other word processing apps, including Microsoft Word and OpenOffice. You can also save your documents in a different format so they’re compatible with other apps."

So "TextEdit" is more like a general purpose document processing tool like Word. Apple's use of the word "text" is much closer to "document". When dealing with true "text", you're going to have lots of issues. I would recommend using something like VSCode.
 
Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

maurizio lana

unread,
Dec 2, 2025, 6:30:11 PM (10 days ago) Dec 2
to ant...@googlegroups.com
hi Laurence,


Il 02/12/25 05:02, Laurence Anthony ha scritto:

TextEdit is certainly not a "clean" text editor. This is revealed in the very first image on the instruction page:

As the blurb says "Open documents in many formats. Create and edit plain text, rich text (.rtfd), and HTML documents, or open and edit documents created in other word processing apps, including Microsoft Word and OpenOffice. You can also save your documents in a different format so they’re compatible with other apps."
yes: Open documents in many formats. Create and edit plain text"
the settings of my textedit are, precisely, 



so i think it's quite clear that in textedit one can work in "plain text". 
So "TextEdit" is more like a general purpose document processing tool like Word. Apple's use of the word "text" is much closer to "document". When dealing with true "text", you're going to have lots of issues. I would recommend using something like VSCode.
well, nevertheless, now that i'm trying to reproduce what i said before (copying-pasting words from Sublime text, where text is UTF-8: 


again AntConc gives me "no hits found".
same if copying-pasting text from BBedit

any idea?
Maurizio

To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv1UFx%2BjRYyEXAp4symugu-8kiiehxuUxxiRgPoCJVob4A%40mail.gmail.com.




many of us believe the EU remains 
the most extraordinary, ambitious, liberal 
political alliance in recorded history. 
where it needs reform, where it needs to evolve, 
we should be there to help turn that heavy wheel
Ian McEwan, The Guardian, 2/6/2017

Laurence Anthony

unread,
Dec 2, 2025, 6:56:31 PM (10 days ago) Dec 2
to ant...@googlegroups.com
Hi again,

It's not clear to me that TextEdit can operate in a strict 'plain text' mode. If you open a text file and then apply a bold style, does the text change to a bold look. If so, you are no longer in text mode. As I wrote before, TextEdit is a general purpose document editing tool. When dealing with true "text", you're going to have lots of issues.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

maurizio lana

unread,
Dec 3, 2025, 9:27:24 PM (9 days ago) Dec 3
to ant...@googlegroups.com
hi Laurence,
don't want to start a ping-pong, but probably this is useful to know something about textedit.
when textedit is set to open textual files in UTF-8 mode, there isn't in textedit a 'formatting menu'. 
(the formatting menu appears only if you set textedit to open textual files as RTF).

whichever be the selection (nothing, character, string, all) if i type ctrl-b all the text goes in bold.
but: when i save, with the setup "save in utf-8", and reopen the saved text with another programmer's editor like bbedit, the text is recognized as UTF-8 encoding.

anyway, let's abandon textedit and use only Sublime
 for these experiments. 
the very question is:
i create from scratch a list of word in Sublime with UTF-8 encoding, 
copy the words 1 per line, 
paste it in antconc / word / adv search, 
click start and get a "no hits found" message
(and pasting from BBedit is the same)

best
Maurizio


Il 03/12/25 00:55, Laurence Anthony ha scritto:

Laurence Anthony

unread,
Dec 3, 2025, 9:35:08 PM (9 days ago) Dec 3
to ant...@googlegroups.com
Hi,

I just tried pasting a list of words that I wrote in the email and it worked fine. When you paste any text from one application into AntConc, it attempts to only paste the text element. So, it's probably less about the application and more about the format. Below is a screenshot of the process using the demo corpus in AntConc. Try to see if you can replicate this.

image.png

I hope that helps!

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Reply all
Reply to author
Forward
0 new messages