Open Source Tamil Stemmer

63 views
Skip to first unread message

Sundar

unread,
Mar 23, 2013, 12:18:07 PM3/23/13
to ThamiZha! - Free Tamil Computing(FTC), Damodharan Rajalingam
Damodaran has shared the source code of his Tamil Stemmer (under a free licence, I suppose) at https://github.com/rdamodharan/tamil-stemmer. It'll be a great complement to solthiruthi. We should contribute and develop it further.

- Sundar

Shrinivasan T

unread,
Mar 27, 2013, 3:51:22 AM3/27/13
to freetamil...@googlegroups.com, Damodharan Rajalingam
Awesome effort.

I installed this and working great.

see the test results.

./stemwords -l ta
கண்கள்
கண்
பங்களிப்போர்
பங்களிப்போர்
கணியம்
கணியம்
வாழ்த்துகளுடன்
வாழ்
தெய்வங்களும்
தெய்வம்
சேகரிப்புகளும்
சேகரிப்பு
நூல்கள் 
நூல்
பதிப்பிக்கப்பட்ட
பதிப்பி
தமிழும்
தமிழ்
ஆங்கிலமும்
ஆங்கிலம்
காலத்தே
கால
 தன்மையிழந்து   
 தன்மையிழ
முதலிடத்தையும்
முதலிடம்
மதச்சார்பற்ற
மதச்சார்பற்ற
முதலிடத்தைப்
முதலிடம்
இடத்தையும்
இடம்


Wishes for the developer.

On Sat, Mar 23, 2013 at 9:48 PM, Sundar <balas...@gmail.com> wrote:
Damodaran has shared the source code of his Tamil Stemmer (under a free licence, I suppose) at https://github.com/rdamodharan/tamil-stemmer. It'll be a great complement to solthiruthi. We should contribute and develop it further.

- Sundar

--
You received this message because you are subscribed to the Google Groups "ThamiZha! - Free Tamil Computing(FTC)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to freetamilcomput...@googlegroups.com.
To post to this group, send an email to freetamil...@googlegroups.com.
Visit this group at http://groups.google.com/group/freetamilcomputing?hl=en-GB.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Regards,
T.Shrinivasan


My Life with GNU/Linux : http://goinggnu.wordpress.com
Free/Open Source Jobs : http://fossjobs.in

Get CollabNet Subversion Edge :     http://www.collab.net/svnedge

Sundar

unread,
Mar 27, 2013, 3:57:16 AM3/27/13
to freetamil...@googlegroups.com, Damodharan Rajalingam
Pretty good for what Damu claims to be a start!

- Sundar

Muguntharaj Subramanian

unread,
Mar 27, 2013, 9:31:52 AM3/27/13
to freetamil...@googlegroups.com, Damodharan Rajalingam
Great work Damodharan.

Thanks Sundar for introducing Damodharan's work to us.

This will be very useful for our spellchecker project.

Our hunspell based spellchecker project need an efficient affix file + root words list (https://github.com/thamizha/solthiruthi )
Affx file contains all the root words(or stem words) and the rules to expand them into various derived words.

Currently we are creating affix file & words list manually. Damodarans stemmer project can automate this these files creation thereby enabling efficiency of our spell checker.

I request Damodharan to join this list(if he havent already) and help in updating the word list and affix files for our spellchecker using his stemmer.
It would be good if stemmer is developed as a Thamizha project.

Regards,
Mugunth

தங்கமணி அருண் || Thangamani Arun

unread,
Mar 28, 2013, 1:01:58 AM3/28/13
to freetamil...@googlegroups.com, Damodharan Rajalingam
Dear Damodharan and Others,

Good and useful work .
Me too tested your code.
As mentioned by shrinivasan, I getting similar output.

Thanks a lot for your good effort.
Please keep up the same.
--
You received this message because you are subscribed to the Google Groups "ThamiZha! - Free Tamil Computing(FTC)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to freetamilcomput...@googlegroups.com.
To post to this group, send an email to freetamil...@googlegroups.com.
Visit this group at http://groups.google.com/group/freetamilcomputing?hl=en-GB.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
அன்புடன்
அருண் || Arun
[ நுட்பம் நம்மொழியில் தழைக்கச் செய்வோம் ]
எனது கிருக்கல்கள் : http://thangamaniarun.wordpress.com

Reply all
Reply to author
Forward
0 new messages