I haven't found a thread on this, but apologies if one exists!
I am new to BBEdit, and am using it to clean .txt files prior to text mining. I am converting files to .txt from PDF to ensure R reads the files in correctly (I've had issues with the R PDF reader). When I do this conversion, there are often duplicates of words, appearing like "to to" or "finally finally" throughout the text. These get flagged for grammar in TextEdit and Word, but to fix it, it requires you go through the entire document manually. I have thousands of pages to go through - if I ever want to finish my dissertation, I can't do that.
I tried the Process Duplicate Lines command in BBEdit, but it did not remove duplicates of words within lines. Does anyone know if there is a way to get BBEdit to identify duplicate words, then automatically delete one of them?
(or if not BBEdit, then Word or TextEdit?)
Thanks!