do I need beautifulsoup for reading this filetype?

10 views
Skip to first unread message

James Sedlacek

unread,
Dec 8, 2017, 2:57:06 PM12/8/17
to beautifulsoup
I have some code that I once used for reading xml text files.  Basically this script would open these files and copy their text into a string.  bs4 would convert the string (from xml) into something manipulatable (probably was utf-8) with python and nltk.  Then I would use sentence tokenizer, word tokenizer, lemmatizer and morphological tagger on the data, and print results to a .txt file that had utf-8 coding.  Now i have a corpus of untagged txt files, where the code is already utf-8.  Do I need to parse it with beautifulsoup? or could I use a plaintextparser from nltk?
Reply all
Reply to author
Forward
0 new messages