Persian word list database

2050 views
Skip to first unread message

reza moshksar

unread,
Nov 16, 2012, 11:21:04 AM11/16/12
to Persian Computing
Hi all,
I collected a data set that contains these data
1-Dehkhoda Dictionary (only items)
2-Moin Dictionary (only items)
4-fa.wikipedia's page titles (only words that they don't have space) for modern words that are not in data 1-2-3 like گوگل
I didn't change their original data but I cleaned them

Cleaning Process:
For all of data, I Removed  Persian numbers, parenthesis,duplicated words  and replaced (ي ك) with Persian's.
I made a column that is mixed of Moin Dictionary+Farsi spell checker of open office+fa.wikipedia's page titles and removed the duplicate  words and it has 382K words! I ignored Dehkhoda's data because most of the words are not common in today's Persian writing!

Download:
I collected them in excel file and similar text file.
Licence:
It's License is GNU GENERAL PUBLIC LICENSE  (Version 2)

favour :
I hope it will useful for improving spell checker of Chrome and Firefox and open office
unfortunately I don't have access to their repository if some one has access please update their spell checkers.
Any feedback is appreciated
yours,

REZA MOSHKSAR 


Ebrahim Byagowi

unread,
Nov 16, 2012, 2:10:59 PM11/16/12
to Persian Computing
Thanks to your great efforts.

Actually Chrome is not supporting Persian spell-checking, is someone able to file a bug similar this for Persian? Chrome is supporting Afrikaans (with 15–23 million native speaker, ref) so I guess Persian is also eligible for having Google Chrome support.

Nasrollah Noori

unread,
Nov 18, 2012, 1:53:36 PM11/18/12
to persian-...@googlegroups.com
Hi, reza1

I have a suggestion, isn't better that we correct some words? For those that don't have "no break space", such as: آبادسازیها to آبادسازی‌ها.

I also have another request, in projects that translated to Persian like gnome every body use his own translate, like "file" to فایل and پرونده. can anybody help to create a dictionary or weblog for best and unique translation of words that frequently use in computer application.

reza moshksar

unread,
Nov 18, 2012, 2:42:43 PM11/18/12
to Nasrollah Noori, persian-...@googlegroups.com
Hi,
On Sun, Nov 18, 2012 at 10:53 AM, Nasrollah Noori <nan...@gmail.com> wrote:
Hi, reza1

I have a suggestion, isn't better that we correct some words? For those that don't have "no break space", such as: آبادسازیها to آبادسازی‌ها.
 Unfortunately we don't have any evidence that says ها should be attached or joined with ZWNJ and it depend on users!


I also have another request, in projects that translated to Persian like gnome every body use his own translate, like "file" to فایل and پرونده. can anybody help to create a dictionary or weblog for best and unique translation of words that frequently use in computer application.

We have many translations in here  that wikipedia's users tried to find the best definitions according to farhangestan e zaban va adabe farsi (فرهنگستان زبان و ادب فارسی).



--
--
REZA MOSHKSAR
Phd candidate in Building, Environment, Science and Technology
B.E.S.T Department - Politecnico di Milano
Via Bonardi 9 20133 Milano Italy


Omid Mottaghi

unread,
Dec 14, 2012, 1:26:39 AM12/14/12
to reza moshksar, Nasrollah Noori, Persian Computing
Thanks for the list.

How do you find wikipedia words?

Could you find it's occurrence in wikipedia and count them?
"Word occurrence" can be used in word suggestion.

:)

noch...@gmail.com

unread,
Apr 6, 2020, 1:15:54 AM4/6/20
to Persian Computing
Thnx for your great efforts and letting others to make use of it.
Reply all
Reply to author
Forward
0 new messages