Hi all,
I collected a data set that contains these data
1-Dehkhoda Dictionary (only items)
2-Moin Dictionary (only items)
4-fa.wikipedia's page titles (only words that they don't have space) for modern words that are not in data 1-2-3 like گوگل
I didn't change their original data but I cleaned them
Cleaning Process:
For all of data, I Removed Persian numbers, parenthesis,duplicated words and replaced (ي ك) with Persian's.
I made a column that is mixed of Moin Dictionary+Farsi spell checker of open office+fa.wikipedia's page titles and removed the duplicate words and it has 382K words! I ignored Dehkhoda's data because most of the words are not common in today's Persian writing!
Download:
I collected them in
excel file and similar
text file.
Licence:
It's License is GNU GENERAL PUBLIC LICENSE (Version 2)
favour :
I hope it will useful for improving spell checker of Chrome and Firefox and open office
unfortunately I don't have access to their repository if some one has access please update their spell checkers.
Any feedback is appreciated
yours,
REZA MOSHKSAR