--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/30523dd0-fe1e-4466-91e0-30061f429b30n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv0PM5ff1pW9NCYL5iNaXoV7dCxf5QF%3DfGfdKpbRr8mFuQ%40mail.gmail.com.
lo straniero non parla e non capisce la nostra lingua, che non è più nostra, perché la nostra vera lingua diventa la traduzione, lo scambio luca ferrieri, dalla public library all’open library
To view this discussion visit https://groups.google.com/d/msgid/antconc/f4d9e8a4-a83f-4f6d-822e-910cc68d5dfe%40uniupo.it.
Here is a clearer, more academic English version of the “What should you do?” part:
Recommended Practical Measures
Enforce plain-text mode in TextEdit (if you continue using it).
Before copying any material into AntConc, convert the document in TextEdit to plain text (e.g. via Format → Make Plain Text). This step removes hidden formatting (RTF/HTML markers) and reduces the risk of incompatible metadata being transferred together with the text.
Standardise the character encoding to UTF-8.
In TextEdit’s preferences, explicitly set the default encoding for both opening and saving files to UTF-8. Ensuring that the source file and AntConc share the same encoding eliminates many problems related to unreadable characters and failed searches.
Disable “smart” typographical substitutions.
Features such as smart quotes, smart dashes, and automatic ligatures should be turned off. These substitutions can introduce characters that differ from the basic ASCII/Unicode forms expected by corpus tools, leading to mismatches during concordance or frequency searches.
Prefer a code editor for corpus preparation.
For greater reliability, it is advisable to prepare and clean corpus files in a dedicated text/code editor such as Sublime Text or VS Code. These editors treat all content strictly as plain text, make the active encoding (UTF-8) explicit, and avoid injecting hidden formatting.
Save and import as UTF-8 .txt files instead of copy–paste where possible.
Instead of pasting directly from TextEdit into AntConc, save the corpus as a UTF-8 encoded .txt file and then load that file from within AntConc. This workflow minimises the intermediate transformations that can corrupt encoding or normalization.
Adopt a consistent Unicode normalization strategy.
If your data contain accented or non-Latin characters, consider applying a uniform Unicode normalization form (e.g. NFC) to all corpus files before analysis. Consistent normalization ensures that visually identical characters are also identical at the code-point level, which is crucial for reliable tokenisation and search.
Note: This text was prepared with the assistance of an AI tool and finalized after my own (Dr.Ali DUMAN) review and edits.
27 Kas 2025 Per, saat 18:37 tarihinde Laurence Anthony <antho...@gmail.com> şunu yazdı:
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv1y%2BkSBMxv%3D_GJW%3D24mndHRerUdMyJKhv3Dmw9%3D-0AF4A%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAOX8WroFSkQ%3Ds%3DtXyUbSozqP%3D1XTV53c8HAFjw63Zvw81pTPQA%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv2yxBTXBUr3LVcG2LtEMaZ%2BKHCt_FJthGyQQ5cpx20Xhg%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv1y%2BkSBMxv%3D_GJW%3D24mndHRerUdMyJKhv3Dmw9%3D-0AF4A%40mail.gmail.com.
Here is a clearer, more academic English version of the “What should you do?” part:
Recommended Practical Measures
Enforce plain-text mode in TextEdit (if you continue using it).
Before copying any material into AntConc, convert the document in TextEdit to plain text (e.g. via Format → Make Plain Text). This step removes hidden formatting (RTF/HTML markers) and reduces the risk of incompatible metadata being transferred together with the text.
Standardise the character encoding to UTF-8.
In TextEdit’s preferences, explicitly set the default encoding for both opening and saving files to UTF-8. Ensuring that the source file and AntConc share the same encoding eliminates many problems related to unreadable characters and failed searches.
Disable “smart” typographical substitutions.
Features such as smart quotes, smart dashes, and automatic ligatures should be turned off. These substitutions can introduce characters that differ from the basic ASCII/Unicode forms expected by corpus tools, leading to mismatches during concordance or frequency searches.
Prefer a code editor for corpus preparation.
For greater reliability, it is advisable to prepare and clean corpus files in a dedicated text/code editor such as Sublime Text or VS Code. These editors treat all content strictly as plain text, make the active encoding (UTF-8) explicit, and avoid injecting hidden formatting.
Save and import as UTF-8 .txt files instead of copy–paste where possible.
Instead of pasting directly from TextEdit into AntConc, save the corpus as a UTF-8 encoded .txt file and then load that file from within AntConc. This workflow minimises the intermediate transformations that can corrupt encoding or normalization.
Adopt a consistent Unicode normalization strategy.
If your data contain accented or non-Latin characters, consider applying a uniform Unicode normalization form (e.g. NFC) to all corpus files before analysis. Consistent normalization ensures that visually identical characters are also identical at the code-point level, which is crucial for reliable tokenisation and search.
Note: This text was prepared with the assistance of an AI tool and finalized after my own (Dr.Ali DUMAN) review and edits.
27 Kas 2025 Per, saat 18:37 tarihinde Laurence Anthony <antho...@gmail.com> şunu yazdı:
I think TextEdit on Mac is a rich text format. So, it's probably got some formatting as part of the text when you copy.
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv1y%2BkSBMxv%3D_GJW%3D24mndHRerUdMyJKhv3Dmw9%3D-0AF4A%40mail.gmail.com.
quando sono diventato padre ho capito che i genitori hanno due compiti fondamentali: il primo è quello di difendere il proprio figlio dalla malvagità del mondo; il secondo è quello di aiutarlo a riconoscerla
To view this discussion visit https://groups.google.com/d/msgid/antconc/79A8C8A7-E8C8-423D-B485-54914AB32674%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/6bc03b3c-f4dc-4bff-bd7a-25100fe9d7f5%40gmail.com.
I think TextEdit on Mac is a rich text format. So, it's probably got some formatting as part of the text when you copy.Does anybody else have any ideas why it doesn't work with TextEdit?
per quelli che non hanno fatto in tempo a trovare un riparo film:questi giorni
To view this discussion visit https://groups.google.com/d/msgid/antconc/1ed066cb-7c4a-4e3d-9d9e-1a2834a50c81%40gmail.com.
TextEdit is certainly not a "clean" text editor. This is revealed in the very first image on the instruction page:
As the blurb says "Open documents in many formats. Create and edit plain text, rich text (.rtfd), and HTML documents, or open and edit documents created in other word processing apps, including Microsoft Word and OpenOffice. You can also save your documents in a different format so they’re compatible with other apps."


So "TextEdit" is more like a general purpose document processing tool like Word. Apple's use of the word "text" is much closer to "document". When dealing with true "text", you're going to have lots of issues. I would recommend using something like VSCode.

To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv1UFx%2BjRYyEXAp4symugu-8kiiehxuUxxiRgPoCJVob4A%40mail.gmail.com.
many of us believe the EU remains the most extraordinary, ambitious, liberal political alliance in recorded history. where it needs reform, where it needs to evolve, we should be there to help turn that heavy wheel Ian McEwan, The Guardian, 2/6/2017
To view this discussion visit https://groups.google.com/d/msgid/antconc/443d2399-dd8a-4c14-8ac2-a5c1ee10a58d%40uniupo.it.
To view this discussion visit https://groups.google.com/d/msgid/antconc/CAL6Fgv1oHNwAowK3OJ5L0ZbZiHKZ%3Dza55DYN%2BBkvTBoS_8Ptug%40mail.gmail.com.

To view this discussion visit https://groups.google.com/d/msgid/antconc/82f15cce-0452-4ce5-b9cd-c8d477942ed8%40uniupo.it.