Dear Professor Anthony,
I am working on a Windows 11 PC.
I have created a database from 73 healthcare documents relating to the English NHS and used AntfileConverter to convert these documents from pdfs to text files. My analysis at this early stage is to look at the frequencies and collocates/accumulated collocates of three terms:
admin*
manag*
lead*
I've generated frequencies spreadsheets from the Antconc tool and started to develop a series of graphics. However, when examining the wordform outputs from these searches (using word tool in Antconc, I have noticed for admin* and manag* that the wordforms include a large number of misspelt words (see below), which when explored more closely in the documents seems to have been created as a result of converting the files into text. Some misspelt words are just misspelt, but the majority appear to have resulted from conversion to text. Here is an example from one report, where not only the key term has been changed but many of the other words as well:
This has left me at a loss as to what do as going through each of the 73 text files and correcting them is a huge task, and one I don't really have the time to do. What can I do to correct this?
Many thanks
Deborah