Hi,
for a small project I am working with Twitter-data. The corpus contains around 600.000 Tweets. Now my main problem is, that these are a lot texts that are independent from another. When I include them into a bunch of files (one for each month, for instance), I have the problem that collocates are “contaminated” by the adjacent tweets.
My first impulse was to use one file per tweet. But given the amount of tweets that will likely overcharge even the new, super fast AntConc 4.0. :)
Is there a way to include multiple independent texts in one txt-file? Or, in other words, is there a way to separate texts so that collocates cannot extend beyond the separator?
I hope I could make my problem somewhat clear.
David