I have processed a text file to create trigrams and have written the output to a file; here is an example of the output file.
earnings,evening,earnings,65303.132245
earnings,growth,2016,65161.018628
earnings,140,earnings,65130.507383
lower,earnings,2015,65096.017731
2015,base,earnings,65022.922881
earnings,reconcile,earnings,64946.317262
2015,earnings,interest,64848.114762
income,growth,earnings,64787.065685
terms,earnings,2015,64782.769760
adjusted,earnings,growth,64488.788412
Some of the lines include a year as one of the trigram terms ; 2015,base,earnings,65022.922881 for example. Other lines include a term that is a number, not a year; earnings,140,earnings,65130.507383 for example. I don’t care about these trigrams and would like to delete them but I do want to keep trigrams where the number/year is 2000 or greater. So the logic that I propose is
Read the line
If any of the first three terms is less than 2000 delete the line
Write the line to a file.
I would appreciate suggestions as to how to code this logic in python. I have looked for examples and found examples of how to delete a line based on certain key words but did not find any code that is similar to what I want to do. My python skills are very basic and I need some help with this.
import fileinput
for line in fileinput.input("data.txt", inplace = 1):
fields = line.strip('\n').split(',')
delete_line = False
if len(fields) > 2:
# Check first 3 terms
for i in range(0, 3):
if fields[i].isdigit() and int(fields[i]) < 2000:
delete_line = True
break
if not delete_line:
# Please note the comma at the end, it prevents a newline
print(line),
--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.