Developing regional language

20 views
Skip to first unread message

aditya dash

unread,
Apr 29, 2019, 2:57:48 PM4/29/19
to indi...@googlegroups.com
Hi Team

It’s a great thing what you are doing. I am thinking of contributing to this and be a part of the journey too. I want to develop library for my regional language (odia ) too. So I need all of your help and support. If you can guide me step by step how to accomplish this thing, I would be really grateful to each and everyone. Waiting for your positive feedback’s.

Regards 
Aditya Bikram Dash

Arijit Patra

unread,
Apr 30, 2019, 5:47:18 AM4/30/19
to aditya dash, indi...@googlegroups.com
I am keen on an Odia NLP lib too. Would love to learn from others who have done this for other regional languages.

regards,

Arijit Patra,
DPhil Candidate | Engineering Science,
BioMedIA, Institute of Biomedical Engineering,
Rhodes Scholar (India and Exeter, 2016),
The University of Oxford.




--
You received this message because you are subscribed to the Google Groups "indicnlp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to indicnlp+u...@googlegroups.com.
To post to this group, send email to indi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/indicnlp/CAKWCwgT7E3Wg9KVogSnr1wOuoMnBLBfLw4d59BV3ezCNHRTmDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Shrinivasan T

unread,
Apr 30, 2019, 5:52:26 AM4/30/19
to indi...@googlegroups.com
start with a python library to handle the unicode text well in odiya.

For tamil, we have open-tamil python library
https://github.com/Ezhil-Language-Foundation/open-tamil

Bring a odiya version of this file.
https://github.com/Ezhil-Language-Foundation/open-tamil/blob/master/tamil/utf8.py

செவ்., 30 ஏப்., 2019, பிற்பகல் 3:17 அன்று, Arijit Patra
<arij...@gmail.com> எழுதியது:
> To view this discussion on the web visit https://groups.google.com/d/msgid/indicnlp/CAPsAPwr-VALKhxCF4%2BAPV-A6iz6iBwc0-pbT_aQ%2B0zeLe0B68g%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--
Regards,
T.Shrinivasan


My Life with GNU/Linux : http://goinggnu.wordpress.com
Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com

Get Free Tamil Ebooks for Android, iOS, Kindle, Computer :
http://FreeTamilEbooks.com

vanangamudi

unread,
May 3, 2019, 1:32:42 AM5/3/19
to indicnlp
Adding to what Shrini already has said, if you want to work on ML sides of the NLP, the first step would be to build corpus. The bare essential would a language modelling dataset which you can build by scraping the newspapers. Slightly more complex one is sentiment analysis, you can build this one by scraping content from cinema reviews site where the movie review has both text and a rating. Usually the reviews with rating 3/5 are positive and below 3/5 can be deemed negative.

You can find example code for scraping and language modelling  here.

On Tuesday, April 30, 2019 at 3:22:26 PM UTC+5:30, Shrinivasan T wrote:
start with a python library to handle the unicode text well in odiya.

For tamil, we have open-tamil python library
https://github.com/Ezhil-Language-Foundation/open-tamil

Bring a odiya version of this file.
https://github.com/Ezhil-Language-Foundation/open-tamil/blob/master/tamil/utf8.py

செவ்., 30 ஏப்., 2019, பிற்பகல் 3:17 அன்று, Arijit Patra
<arij...@gmail.com> எழுதியது:
>
> I am keen on an Odia NLP lib too. Would love to learn from others who have done this for other regional languages.
>
> regards,
>
> Arijit Patra,
> DPhil Candidate | Engineering Science,
> BioMedIA, Institute of Biomedical Engineering,
> Rhodes Scholar (India and Exeter, 2016),
> The University of Oxford.
>
>
>
>
> On Mon, Apr 29, 2019 at 7:57 PM aditya dash <abdas...@gmail.com> wrote:
>>
>> Hi Team
>>
>> It’s a great thing what you are doing. I am thinking of contributing to this and be a part of the journey too. I want to develop library for my regional language (odia ) too. So I need all of your help and support. If you can guide me step by step how to accomplish this thing, I would be really grateful to each and everyone. Waiting for your positive feedback’s.
>>
>> Regards
>> Aditya Bikram Dash
>>
>> --
>> You received this message because you are subscribed to the Google Groups "indicnlp" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to indicnlp+unsubscribe@googlegroups.com.
>> To post to this group, send email to indi...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/indicnlp/CAKWCwgT7E3Wg9KVogSnr1wOuoMnBLBfLw4d59BV3ezCNHRTmDw%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "indicnlp" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to indicnlp+unsubscribe@googlegroups.com.

Ravi Annaswamy

unread,
May 3, 2019, 5:27:11 AM5/3/19
to vanangamudi, indicnlp
Adding to Shrini and Vanangamudi, this repo nlp-for-tamil was done by me based on example code from Gaurav,


It has a 

-wikipedia dataset downloader
- a tokenizer
-language model 
and 
- classifier 
and a word vector browser.

for tamil that you can build upon.
For instance to extract all odiya wikipedia articles you can use and modify possibly one line, in this notebook:

The notebooks are documented and self contained and you can take a look.
Depending on your coding familiarity, you should pick one of any our suggestions and conquer it, before
moving further. Depending on which task you are interested in, you can respond here to any of us.

**
If you want to go step by step (slowly but surely) you can start with a newspaper crawler and the unicode processor.
You can then build a unique wordlist for odiya, a concordance browser for the articles etc.

The tokenizer/language model stuff can come later.

Thanks
Ravi


>> To unsubscribe from this group and stop receiving emails from it, send an email to indicnlp+u...@googlegroups.com.
>> To post to this group, send email to indi...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/indicnlp/CAKWCwgT7E3Wg9KVogSnr1wOuoMnBLBfLw4d59BV3ezCNHRTmDw%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "indicnlp" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to indicnlp+u...@googlegroups.com.
> To post to this group, send email to indi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/indicnlp/CAPsAPwr-VALKhxCF4%2BAPV-A6iz6iBwc0-pbT_aQ%2B0zeLe0B68g%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--
Regards,
T.Shrinivasan


My Life with GNU/Linux : http://goinggnu.wordpress.com
Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com

Get Free Tamil Ebooks for Android, iOS, Kindle, Computer :
http://FreeTamilEbooks.com

--
You received this message because you are subscribed to the Google Groups "indicnlp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to indicnlp+u...@googlegroups.com.

To post to this group, send email to indi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages