Dear professors and fellow academics,
I am using TAALED on Windows to measure lexical diversity in a learner spoken corpus. The user manual suggests that for an ideal use, a number of preprocessing steps will be needed using the pylats package. I have had all the texts tokenized so far, but I am not sure if tokenization is enough for pre-processing. Could you kindly clarify which (if any) further preprocessing steps are necessary?
Best,
Mahtab Kolahi
PhD candidate, Ferdowsi University of Mashhad, Iran