Download Parser

26 views
Skip to first unread message

Viola

unread,
Jun 4, 2021, 7:42:32 AMJun 4
to WordSmith Tools
Hello Mike,
thank you so much for putting together the Download Parser to help with processing texts from newspaper archives.

Unfortunately, I can't seem to get it to work. I am using a file of newspaper texts that I have converted with the WST text converter to Unicode text files.
In the "First parse" tab, it finds my file (just one for testing) and it recognises that there are 3 texts in that file:  "3 texts found". But nothing else happens - when I click done, a popup states "No data found", and I cannot find the file with the tags around the headers anywhere. I have added the fields that my data has (because, the capitalisation is different, for example it's "Byline:" and "Length:" rather than "BYLINE:" and "LENGTH:") and I have unselected the fields that are not in my data. 
In the parsed files folder, the HEADLINES etc. folders remain empty. The author file also includes no data.
Can you point me to what is going wrong, by any chance?
Thank you very much!

All the best,
Viola

Mike Scott

unread,
Jun 4, 2021, 12:25:59 PMJun 4
to WordSmith Tools
Thanks for this, Viola.

Looks as if LexisNexis may have changed format. The Download Parser parses .TXT format downloads, which LexisNexis allowed one to get, up to 500 at a time. The Word format you downloaded doesn't have quite the same form. I think it'd be perfectly possible to adapt the Download Parser to find the mark-up present in the Word .docx download -- but I would need to have spare time for that. At present I'm working hard on the 64-bit WordSmith..... Maybe in a few months?

Sorry! -- Mike

Viola

unread,
Jun 11, 2021, 9:05:18 AMJun 11
to WordSmith Tools
Thank you Mike, just saw this! Yes they keep changing everything all the time, it's so difficult to keep up! (and not sure if that's worth the effort..)

If anybody else stumbles across this, I found a package that can "clean" the articles, importing them into R data frames, for anybody who is happy to use R: https://github.com/JBGruber/LexisNexisTools 

Viola

Reply all
Reply to author
Forward
0 new messages