Creating an Assamese Dictionary

89 views
Skip to first unread message

অমিতাক্ষ ফুকন

unread,
Feb 3, 2009, 6:32:05 AM2/3/09
to শব্দ সংঘ - XOBDO
Hello,


First of all, a hearty congratulations to all members of XOBDO for
getting the 20K words landmark!!
That's quite an achievement in itself.

I am writing this post with a project in mind.

Extremely sorry making such a long post.

I am currently working on coming up with an Open Source Assamese
Desktop.For this, I have done translation work for GNOME, KDE, Fedora,
Firefox etc. The rendering, keyboard display issues etc. are being
taken care of and the issues have been resolved successfully to a
great extent. However, much work lies ahead.

One area I see where Assamese Desktop lags behind other Indian
Languages is that there are not much options for "tools" that can be
used in a standard Assamese Desktop. I have an aim to come up with a
"Dictionary" that can be installed into the OS and used. I am
presently working and maintaining packages for Fedora, in relation to
Assamese language. I was thinking of creating an ASPELL and HUNSPELL
dictionary for Assamese. As many of you might know, these dictionaries
are widely used in applications like OpenOffice, Firefox etc. Almost
all the Indian languages that Fedora supports have got dictionaries
coming with them. They may not be full fledged dictionaries, but they
do exist. Only Assamese does not have one. I have the tools and know
how of making the dictionaries but not the word list. So, I was
thinking if it is possible for me to use XOBDO's database to create
the dictionaries.

Once I build the dictionaries, I can package them to be used with
Firefox, OpenOffice etc and it can be used by one and all and also
across platforms.

I would like to know what members think of it. If anyone have any
questions/doubts regarding this, please feel free to ask. There are no
conditions involved in its creation or usage. The aim is primarily to
give full technical support to come up with a base foundation for
Assamese Desktop. All are most welcome to take part in this initiative
and contribute to it.

The rendering in all the major desktop environment for Assamese has
been taken care of. Rendering and printing for Firefox has also been
taken care of. As also the rendering and printing issues with
OpenOffice. Next step for a language desktop is to have a set of
dictionary and subsequently a spell checker. Making a spell checker is
a trivial issue once a dictionary is in place. So, the next step is
building a dictionary. And hence this post.

Please do send me your feedback/suggestions/questions.

Best regards,
Amitakhya.

বিক্ৰম

unread,
Feb 3, 2009, 7:02:59 AM2/3/09
to শব্দ সংঘ - XOBDO, Biraj Kumar Kakati, anjal...@indiatimes.com, anjali sonowal, Partha Protim Sarmah, Priyankoo, Pallav Saikia, Buljit Buragohain, Rupankar Mahanta, Rupkamal Talukdar, prasanta borah, Prasenjit Khanikar, Pankaj Barah, only...@yahoo.co.uk
Priyankoo, Pallav, Biraj, Buljit, Anjal, Anjali Baidew, Partha,
Neelotpol, Prasanta, Prasenjit, Rupankar, Rupkamal and others,

By virtue of your contribution and involvement with XOBDO in various
ways, you are "Major Stakeholders" of XOBDO. It is your hard work that
is stored in XOBDO. Therefore, your consent is important for any major
decision.

Amitakya Phukan is comming up with a proposal. In a nutshell, he is
asking if we would allow XOBDO's database to be shared and used in
open source software like OpenOffice, Firefox etc.

I urge all of you to study his proposal, raise your concerns, ask
questions and finally say your "YES" or "NO".

Thanks,
Bikram

Bikram Baruah

unread,
Feb 3, 2009, 12:01:05 PM2/3/09
to Rupkamal Talukdar, XOBDO groups
Rupkamal.
Great to see you back in XOBDO.
The original mail is in the googlegroup. I'll forward it in a minute.
Thanks,
Bikram


On Tue, Feb 3, 2009 at 7:56 PM, Rupkamal Talukdar <rupkamal...@gmail.com> wrote:
Hi Bikram,

I happy to return to active mode after a long break. :)

About this proposal, I would have loved to the see original mail.

Going by your mail, I would say it will be a great step if you extend our effort to open source softwares.
However we also need to have a look at the possible impact it will have in case of any error or discrepancy.
I would suggest we ask for a tentative date for this. Based  on that we can decide upon what are the things we can share. My only concern is that we should be fully error free and reach a minimum standard of explanation/meaning
for each word before we take such a step.

As for action point is concerned, we can get the count of words as of now, put a complete freeze  on adding new words, complete the finishing/polishing task for all present words (Assamese/English)

Please let me know your opinion on this.

Thanks


--
Signing off....
Rupkamal Talukdar
Infosys Technologies
Hyderabad



--
Bikram M. Baruah
for XOBDO.ORG
বিক্ৰম ম বৰুৱা
শব্দ.সংঘ

Bikram Baruah

unread,
Feb 5, 2009, 3:53:20 AM2/5/09
to XOBDO groups


---------- Forwarded message ----------
From: Pallav Saikia <pall...@gmail.com>
Date: Thu, Feb 5, 2009 at 12:09 PM
Subject: Re: Creating an Assamese Dictionary
To: বিক্ৰম <bikr...@gmail.com>
Cc: anjali sonowal <sonowa...@yahoo.com>, Priyankoo <priy...@gmail.com>, Buljit Buragohain <bul...@gmail.com>, "anjal...@indiatimes.com" <anjal...@indiatimes.com>, Partha Protim Sarmah <ppsa...@gmail.com>, It's Neelotpal Deka <only...@yahoo.co.uk>, Neelotpal Deka <only...@gmail.com>, prasanta borah <bora.p...@gmail.com>, Prasenjit Khanikar <prasenjit...@gmail.com>, Rupankar Mahanta <rup...@gmail.com>, Rupkamal Talukdar <rupkamal...@gmail.com>


Personally I am a big fan of Firefox. I use it's inline English spell checker for e-mails and any other web based applications be at home or at my work where I have to type in the text a lot. It also does the spell checking for my online Google Docs. If such dictionary can be extended to Assamese, I think it will serve a great purpose for promoting Assamese medium for communication over net.

I am just taking this as one the examples to show how having an Assamese dictionary plug-in for these open source projects would help. I am not very hopeful that Microsoft might invest in making a spell checker for Assamese language for it's softwares. If as a community, Xobdo can contribute towards a spell checker for Open Source projects, it can generate more interest for using the language among our software literates and at the same time help promoting the wonderful and non pirated softwares like Open Office.

Xobdo as a group has the responsibility to promote and propagate NE languages in all speheres of the society. The linguistic value we are creating here should not be restricted to be utilized only from Xobdo's website. That means, at some point of time Xobdo should be tinkering with the idea of creating softwares that help using the dictionary along with other softwares like this spell check plug in for Open Source Web Browsers and Office suites.

At this point, Xobdo does not have enough resource for creating a dictory plug-in for Open Source on it's own. Amitakhya's proposal to colaborate towards this looks like a logical extension of our goal. But at the same time, we need to have the confidence that our hard work is not taken for a ride by any one having commercial interest in the content Xobdo has created.

I think, Xobdo, should follow the an open copyright policy of it's content like GPL. It would allow anybody to use the dictionary in any form so long there is no direct commercial interest by the user and so long Xobdo gets is due credit   - such as having it's name as the source of content.

My final take on this would be to start a dialogue towards creating the plugin with Xobdo's content but we need to have the ground rules set for having both the parties collaborate with mutual trust. Some of these rules might be like:
  1. No commercial use of this content (what happens if somebody starts selling softwares with a dictionary based on Xobdo, do we ask for a share of royality which we reinvest in Xobdo'?)
  2. Maintenance of GPL kind of licence: Any software using the content mentions Xobdo as the creator of the content and user is free to re-distribute it so long (s)he is still keeping the text mentioning Xobdo's credit intact.
  3. How and when Xobdo provides updated content
  4. Decide on specific software/plug-ins where this content would be used. We can expand this later on mutual agreement,
  5. Right to retract from providing the content when we see any violations or to develop similar software on it's own.
Pretty long mail, sorry about that, but do let me know if I am making any sense.

Thanks,
Pallav

Bikram Baruah

unread,
Feb 5, 2009, 4:39:48 AM2/5/09
to XOBDO groups
---------- Forwarded message ----------
From: Rupankar Mahanta <rup...@gmail.com>
Date: Thu, Feb 5, 2009 at 1:10 PM
Subject: Re: Creating an Assamese Dictionary
To: বিক্ৰম <bikr...@gmail.com>


Hi,
 
I have few questions:
1) No doubt that Amitakhya's project is an ambitious one, but how acceptable it would be with end users is questionable.  I mean I have not seen anyone using a desktop other than in English.
2) It is you who had planted the seed of Xobdo and took care of it till others started assisting you. If this means parting the right of your ownership, then you should follow your heart. Otherwise I agree with what Pallav said that as long as Xobdo is given credit for the information, you may think of sharing it and right of use should be revocable in case of violation.
3) The association should help both the teams.
 
Regards.
 
Rupankar
===============

Bikram Baruah

unread,
Feb 5, 2009, 4:40:27 AM2/5/09
to XOBDO groups


---------- Forwarded message ----------
From: Anjal Borah <anjal...@indiatimes.com>
Date: Thu, Feb 5, 2009 at 1:36 PM


Dear All,

PallavDa mention some very important points like "copyright policy" or "commercial interest". I am agree with PallavDa. We will maintain some polices before use open source software like OpenOffice, Firefox etc.

Thanks,
Anjal

 

prasanta borah

unread,
Feb 5, 2009, 4:49:04 AM2/5/09
to xo...@googlegroups.com
Hi All,

Like as Anjal, i am also agree with Pallav Da. I think we must maintain some policies before using of open source software.

No doubt that Amitakhya  is thinking about a good project. But it is necessary to main tain some policies to use open office, firefox etc.

Thanking you
PRASANTA BORAH
--
PRASANTA BORAH
GUWAHATI, ASSAM

অমিতাক্ষ ফুকন

unread,
Feb 5, 2009, 6:32:59 AM2/5/09
to শব্দ সংঘ - XOBDO
Hi all,

Nice to see the discussions around this topic.

First of all, let me make it clear that there is absolutely no
question of parting away with the ownership of XOBDO.

What I mean is, the GPL license guarantees it. Since some members were
talking about GPL license, let me just add that the doubts raised are
pretty much taken care of by GPL. Forking the work for free and open
distribution is allowed as long as the original credits are
maintained. Any deviation from this principle actually makes it
violation of GPL, as far as I understand.

http://www.gnu.org/licenses/gpl.html

I would like to request all members to kindly go through this webpage
and study it, rather than me writing a long mail.

The questions that I could sum up from the previous mails are as
follows :
---------------------------------
[From Pallavda's mail]

1. No commercial use of this content (what happens if somebody starts
selling softwares with a dictionary based on Xobdo, do we ask for a
share of
royality which we reinvest in Xobdo'?)
2. Maintenance of GPL kind of licence: Any software using the content
mentions Xobdo as the creator of the content and user is free to re-
distribute it so long (s)he is still keeping the text mentioning
Xobdo's credit intact.
3. How and when Xobdo provides updated content
4. Decide on specific software/plug-ins where this content would be
used. We can expand this later on mutual agreement,
5. Right to retract from providing the content when we see any
violations or to develop similar software on it's own.
--------------------------------------------

Well, GPL basically takes care of all such issues[1,2 and 5].
Regarding query 4, my plan as of now is integrating XOBDO's work with
aspell and hunspell (which are in turn used by Open Office, Firefox
etc.). Regarding query 3, we can come up with a plan on maintaining
and updating the local dictionaries to keep in sync with XOBDO
database. How to do it will materialise after we start the project.
For query 1, I am integrating the work to be distributed with Fedora
(http://fedoraproject.org), which is a free to download and use OS.
Some one might use my work to integrate with Ubuntu etc. But my work
shall revolve around Fedora only, and hence, there should be no doubts
about using the work for commercial purpose.

--------------------------------------------
[From Rupankar's mail]

1) No doubt that Amitakhya's project is an ambitious one, but how
acceptable it would be with end users is questionable. I mean I have
not seen anyone
using a desktop other than in English.
2) It is you who had planted the seed of Xobdo and took care of it
till others started assisting you. If this means parting the right of
your ownership, then you should follow your heart. Otherwise I agree
with what Pallav said that as long as Xobdo is given credit for the
information, you may think of sharing it and right of use should be
revocable in case of violation.
3) The association should help both the teams.
----------------------------------------------

1. Thanks for your encouragement. However, I would like to tell that
using a Desktop language is based upon an individual's preference.
Even in English desktop (which is my default Desktop language), if I
type a letter in Assamese or maybe even a paragraph, and I want to run
a spell checker and/or dig out a few synonyms, I should be able to do
that using a set of tools around the application I am using. What I
meant was, if all these pieces - dictionary,spell checker, localised
GUI,keyboard and fonts- are together, a full Assamese desktop can be
rolled out. Whether any one wants to use all the components together
or any one of them, is left to the individual's choice.
2. This project does NOT mean, and will NEVER mean parting of
ownership. Just to sum up the ideas behind the Open Source
development, the development of Open Source software rests on
successful collaboration only. Releasing a work under any version of
GPL guarantees it and violation in any form is violation of the
license agreement and hence revocation can be called for, if needed.
3. I do hope for mutual benefit. :)

That was all the replies I could manage. If I am unclear in any part,
or I have added some more queries in your minds, please feel free to
ask them, either in this mailing list or mail me directly.

Looking forward to your replies,

Best regards,
Amit.

Biraj Kumar Kakati

unread,
Feb 5, 2009, 7:35:03 AM2/5/09
to বিক্ৰম, শব্দ সংঘ - XOBDO, anjal...@indiatimes.com, anjali sonowal, Partha Protim Sarmah, Priyankoo, Pallav Saikia, Buljit Buragohain, Rupankar Mahanta, Rupkamal Talukdar, prasanta borah, Prasenjit Khanikar, Pankaj Barah, only...@yahoo.co.uk

Dear all,

I have little doubt regarding this issue. From last few days we have received few proposals (though not exactly same) like this. I wonder if we can get similar kind of proposals which is for the development of current XOBDO. Probably my contribution is not more than just a rain drop in the vast Ocean of XOBDO. Anyway, I think Pallav-da & Rupankar-da have made everything clear. I am just trying to put forward few queries that are wiggling in my brain.

1.      I also think that it will be nice if we can have a spell checker for Assamese language in Open Office & Mozilla. But in the mean time we should not forget our own policy and goal. We should think what and how much will be the contribution towards the benefit of greater Assamese community and vise versa. Moreover, how XOBDO will be benefited by this project?

2.      The next point that is coming to my mind is regarding the license. I suppose that it will be having GNU Public License. But what if the license is violated. I am not telling about the beneficiary of the Software/spell checker (as there is always a chance of violation). What will be the solution if any of the project collaborators violates the license?

3.      XOBDO is a collective effort (more or less) of each and every contributor. In such cases can we ensure that copyright will not be violated once we transfer the database? In such cases what have you thought to preserve the copyright of contributor as well as XOBDO?

4.      What have you thought if any of the contributor refuses to hand over it? How will we forward in such cases?

5.      As of now, I myself don't think that XOBDO is full-fledged or full-grown. In such cases how far it is justified to handover the database to someone who is not familiar to it.

6.     There may be another option what if XOBDO, (with the help of any governmental or non-governmental funding), come forward to think in this direction.

7.     I just wonder how the database can be a crucial problem if you have all other framework.

Sorry, for making it a little bit long. Catch you later.


------------------------------------------------------------
With best regards,
Biraj Kumar Kakati
from XOBDO

Pranab

unread,
Feb 5, 2009, 8:04:50 AM2/5/09
to শব্দ সংঘ - XOBDO
Dear Biraj Kaiti,

I agree with all your points! I trust we should concentrate more on
the principles laid down by XOBDO rather than going out for any new
venture! There is lot more to do.I guess the site was launched around
2-3 years back and the visitor count still havent crossed 50K. We need
to work on stuffs to reach out to the masses like make it available
through WAP,SMS CODE etc etc.

And even if the new project get covered under GNU Public License,i
tell GNU is not effective! We all use Joomla,Wordpress,Drupal etc as
CMS but how many sites do we see with the Joomla/drupal credits
intact. hehehe....And also there are many ways to nulify any GNU
licence! To know more about GNU hacks,do ping me personally.It wont be
wise to write here!!

Personal opinions!


On Feb 5, 5:35 pm, Biraj Kumar Kakati <bira...@gmail.com> wrote:
> Dear all,
>
> I have little doubt regarding this issue. From last few days we have
> received few proposals (*though not exactly same*) like this. I wonder if we
> can get similar kind of proposals which is for the development of current
> XOBDO. Probably my contribution is not more than just a rain drop in the
> vast Ocean of XOBDO. Anyway, I think Pallav-da & Rupankar-da have made
> everything clear. I am just trying to put forward few queries that are
> wiggling in my brain.
>
> 1.      I also think that it will be nice if we can have a spell checker for
> Assamese language in Open Office & Mozilla. But in the mean time we should
> not forget our own policy and goal. We should think what and how much will
> be the contribution towards the benefit of greater Assamese community and
> vise versa. Moreover, how XOBDO will be benefited by this project?
>
> 2.      The next point that is coming to my mind is regarding the license. I
> suppose that it will be having GNU Public License. But what if the license
> is violated. I am not telling about the beneficiary of the Software/spell
> checker (*as there is always a chance of violation*). What will be the
> solution if any of the project collaborators violates the license?
>
> 3.      XOBDO is a collective effort (more or less) of each and every
> contributor. In such cases can we ensure that copyright will not be violated
> once we transfer the database? In such cases what have you thought to
> preserve the copyright of contributor as well as XOBDO?
>
> 4.      What have you thought if any of the contributor refuses to hand over
> it? How will we forward in such cases?
>
> 5.      As of now, I myself don't think that XOBDO is full-fledged or
> full-grown. In such cases how far it is justified to handover the database
> to someone who is not familiar to it.
>
> 6.     There may be another option what if XOBDO, (*with the help of any
> governmental or non-governmental funding*), come forward to think in this
> direction.
>
> 7.     I just wonder how the database can be a crucial problem if you have
> all other framework.
> Sorry, for making it a little bit long. Catch you later.
>
> ------------------------------------------------------------
> With best regards,
> Biraj Kumar Kakati
> from *XOBDO*

Pranab

unread,
Feb 5, 2009, 8:14:46 AM2/5/09
to শব্দ সংঘ - XOBDO
Moreover we will need a dedicated developers team who can work on the
project full time. Can you give me the figures of Assamese developers
in any OpenSource Project..we have only a handful of such developers
who works full time with any Open Source Project.

So,before we go on with the new initiative,lets build a very good
community of developers who can develop the project and can devote
full time!

There are many projects which we can do greater than the one which the
author of this post has mentioned. We can initiate to work on
OpenSource CMS assamese translation jobs,Google Assamese language etc
etc!



Biraj Kumar Kakati wrote:
> Dear all,
>
> I have little doubt regarding this issue. From last few days we have
> received few proposals (*though not exactly same*) like this. I wonder if we
> can get similar kind of proposals which is for the development of current
> XOBDO. Probably my contribution is not more than just a rain drop in the
> vast Ocean of XOBDO. Anyway, I think Pallav-da & Rupankar-da have made
> everything clear. I am just trying to put forward few queries that are
> wiggling in my brain.
>
> 1. I also think that it will be nice if we can have a spell checker for
> Assamese language in Open Office & Mozilla. But in the mean time we should
> not forget our own policy and goal. We should think what and how much will
> be the contribution towards the benefit of greater Assamese community and
> vise versa. Moreover, how XOBDO will be benefited by this project?
>
> 2. The next point that is coming to my mind is regarding the license. I
> suppose that it will be having GNU Public License. But what if the license
> is violated. I am not telling about the beneficiary of the Software/spell
> checker (*as there is always a chance of violation*). What will be the
> solution if any of the project collaborators violates the license?
>
> 3. XOBDO is a collective effort (more or less) of each and every
> contributor. In such cases can we ensure that copyright will not be violated
> once we transfer the database? In such cases what have you thought to
> preserve the copyright of contributor as well as XOBDO?
>
> 4. What have you thought if any of the contributor refuses to hand over
> it? How will we forward in such cases?
>
> 5. As of now, I myself don't think that XOBDO is full-fledged or
> full-grown. In such cases how far it is justified to handover the database
> to someone who is not familiar to it.
>
> 6. There may be another option what if XOBDO, (*with the help of any
> governmental or non-governmental funding*), come forward to think in this
> direction.
>
> 7. I just wonder how the database can be a crucial problem if you have
> all other framework.
> Sorry, for making it a little bit long. Catch you later.
>
>
> ------------------------------------------------------------
> With best regards,
> Biraj Kumar Kakati
> from *XOBDO*

বিক্ৰম

unread,
Feb 5, 2009, 9:45:56 AM2/5/09
to শব্দ সংঘ - XOBDO
Amit & others,

Based on the discussion so far, I can infer that everybody in
principle agrees to share the database, provided...

(1) it is for greater good of the Assamese language and community.
(2) not used for direct commercial gain.
(3) XOBDO and its members get due credits and retain the ownership.
(4) Right to retract if found any conditions violated.

Therefore, I propose the followings:
XOBDO initially provides a part of the word list available in its
database for Amit to demonstrate its use on a pilot/trial basis.
Once we are convinced that the above conditions are met/will meet in
future, we further agree on the technical details (like frequency of
updates, ways to retract, a agreed upon format of data exchange etc)

Thanks,
Bikram

Pranab Doley

unread,
Feb 5, 2009, 9:58:42 AM2/5/09
to xo...@googlegroups.com
I welcome the proposal of B Baruah Da!


Pranab Doley
Vice President
MisingOnline

Degrees aren't worth anything...if you have a degree,you have a job;If you don't have a job,probably you don't want one and frankly,I don't want one.

Partha Protim Sarmah

unread,
Feb 5, 2009, 12:54:19 PM2/5/09
to xo...@googlegroups.com
Dear All,
 
I agree with the idea of extending XOBDO's reach to the proposed ...
However, we must have strict legal contract to prevent direct or indirect commercial
or other uses (except academic use), of this resources.
 
XOBDO must have the copyright and this mark must appear in all
words that the system borrows from XOBDO.Hence there is a 
need for developing a 'trademark" for XOBDO words which will
automatically appear beside or just above each XOBDO contribution whenever
it is used. And this should be designed in a way that it cannot be tempered with
or copied ( and pasted)
 
I will ask for expert opinion from my colleagues specialising in
copyright and Intellectual Property Rights. We should also study
how journals and other dictionaries etc. maintain that.
 
More later
 
Partha

On Thu, Feb 5, 2009 at 3:10 PM, Bikram Baruah <bikr...@gmail.com> wrote:

Navanath Saharia

unread,
Feb 6, 2009, 12:08:43 AM2/6/09
to xo...@googlegroups.com
Hi,
 
I am also agree Biraj da's points. but my personal feeling is it will also be beneficial to the researcher group related to Assamese/other North-East language Community, if the database is released with some Licencing.
 
And regarding implementation of Assamese spell checker, an M. Tech project is going on Natural Language Processing Lab (NLP Lab) at Tezpur University. It is on the way of completion. The people related to that project are not thinking about the commercial application right now. For your kind information, another project like Part Speech Tagger, morphological analyser, Parser for Assamese language are also running in the same lab, but these are in initial phase. The progress rate of the later projects are very slow, due to lack of a well prepared Assamese word database, corpus database etc..So I think, this database may help the researcher to develop online stuff for Assamese as well as other North-East language Community.
 
 
With regards,
 
Navanath Saharia
 
 




Napaam, Assam
INDIA - 784028

Amitakhya Phukan

unread,
Feb 6, 2009, 1:24:18 AM2/6/09
to xo...@googlegroups.com
Navanath Saharia wrote:
> Hi,
>
> I am also agree Biraj da's points. but my personal feeling is it will
> also be beneficial to the researcher group related to Assamese/other
> North-East language Community, if the database is released with
> some Licencing.
>
> And regarding implementation of Assamese spell checker, an M.
> Tech project is going on Natural Language Processing Lab (NLP Lab) at
> Tezpur University. It is on the way of completion. The people related
> to that project are not thinking about the commercial application
> right now. For your kind information, another project like Part Speech
> Tagger, morphological analyser, Parser for Assamese language are also
> running in the same lab, but these are in initial phase. The progress
> rate of the later projects are very slow, due to lack of a well
> prepared Assamese word database, corpus database etc..So I think, this
> database may help the researcher to develop online stuff for Assamese
> as well as other North-East language Community.
>
>
> With regards,
>
> Navanath Saharia
Hello,

Can you give me any links to this project ? What is the status ? What is
the project road map ? If you can share that information, it will be of
great help. I tried finding it out from the Tezpur University website,
but could not find it.

Best regards,
Amit.

Biraj Kumar Kakati

unread,
Feb 6, 2009, 4:03:31 AM2/6/09
to xo...@googlegroups.com
Hi Navanath,
Thanks for sharing the information. But I just wonder how releasing of database can help a researcher and if so then how far. Let's consider that it will. In such cases why don't we opt for releasing it to some organization/institution for R&D purpose? Provided there will be a MoU in such cases.

------------------------------------------------------------
With best regards,
Biraj Kumar Kakati
from XOBDO



Biraj Kumar Kakati

unread,
Feb 6, 2009, 4:13:55 AM2/6/09
to xo...@googlegroups.com
Dear Amit,
In this context I like to introduce you to few websites. Have a look for it.
  1. http://tdil.mit.gov.in/homepage.asp
  2. http://egovindia.wordpress.com/category/local-language-assamee/
  3. http://www.iitg.ernet.in/rcilts/
  4. http://www.iitg.ernet.in/rcilts/asamiya.htm
------------------------------------------------------------
With best regards,
Biraj Kumar Kakati
from XOBDO



Amitakhya Phukan

unread,
Feb 6, 2009, 4:38:14 AM2/6/09
to xo...@googlegroups.com
Biraj Kumar Kakati wrote:
> Dear Amit,
> In this context I like to introduce you to few websites. Have a look
> for it.
>
> 1. *http://tdil.mit.gov.in/homepage.asp*
> 2. *http://egovindia.wordpress.com/category/local-language-assamee/
> *
> 3. *http://www.iitg.ernet.in/rcilts/*
> 4. *http://www.iitg.ernet.in/rcilts/asamiya.htm*

>
> ------------------------------------------------------------
> With best regards,
> Biraj Kumar Kakati
> from *XOBDO*
>
Hi,

I have known about these web sites since last two-three years. Any
contact with them has been of no result. Most of their works are lacking
"present" working status. Before I started my localization work, I did
consult all these websites. In fact, at present I am maintaining the
upstream Inscript and Phonetic keyboard layouts. The ones that TDIL
developed have many issues like absence of ৺ and ৎ . The work of IITG
has been questioned by many and there are lot of flame wars going on, an
area that doesn't interest me much to venture as of now. Since Mr.
Saharia wrote about a work done Tezpur University, I inquired about that
specific one. I could not manage to find the NLP link in the Tezpur
University website.

The main problem in these projects is that we don't know what or how
they are working. I contacted one person from AMTRON in Guwahati, but he
has no idea about the IITG work also.

http://www.iitg.ernet.in/rcilts/spell_checker.htm
This page only tells about the research papers for the work, doesn't
tell about the product or anything else.


If you come across any resource, please do share it.

Regards,
Amit.

Navanath Saharia

unread,
Feb 6, 2009, 4:59:02 AM2/6/09
to xo...@googlegroups.com
Dear Amitakhya da,
 
We are not publishing all the work in web. But within february you will catch it in university website.  We are already prepare the master plan for it. Actually all the NLP research are going on under the supervision of Dr. Utpal Sharma of Tezpur University.
 
Regarding Status of Spell checker project, at this moment the spell checker can detect a wrong word, and correct it with the help of a database, (therefore we require a huge database, this is one example). The suggestion generation part is still remain uncomplete. Within April it will be completed.
 
Other Projects are in intial stage. For each project we require different kind of database, and all the projects are as a part educational degree, not a funded one by some organization. That is copyright automatically goes to Tezpur University.
 
With best regards
 
nava
 


 

Biraj Kumar Kakati

unread,
Feb 6, 2009, 10:52:02 AM2/6/09
to amitakh...@gmail.com, xo...@googlegroups.com
Dear Amitakhya,
Thanks for your$ mail. I think all the product of IITG is uploaded in the TDIL Data Center. You will get those products free of cost in the following website...
Regarding the missing of ৺ and ৎ I have little doubt whether you have gone through the keyboard layout or not. I have seen both of them in the keyboard and also tried myself. Please, go through the attached "read me" file for the keyboard layout. However, I don't deny that it is faultless. But it seems to be a completed project. If you want more information you can try to contact the persons in the following webpage. They were the project collaborators and investigators.

About Naba's proposal I can only say that it will be summarily rejected by the members as he has mentioned that "copyright automatically goes to Tezpur University". A collaborative project should never be like this. 

--------------------------------------------------
With best regards,
Biraj Kumar Kakati
from XOBDO




Biraj Kumar Kakati

unread,
Feb 6, 2009, 11:36:30 AM2/6/09
to amitakh...@gmail.com, xo...@googlegroups.com
Dear all,
Please, ignore my previous mail. I am sending the mail once more as I have missed to attach the file in the previous mail.


With regards,
Biraj Kumar Kakati

--------------------------------------------------------------------------------------


Dear Amitakhya,
Thanks for your mail. I think all the product of IITG is uploaded in the TDIL Data Center. You will get those products free of cost in the following website...
Akruti Assamese MultiFont Engine Readme.pdf

বিক্ৰম

unread,
Feb 8, 2009, 12:16:17 AM2/8/09
to শব্দ সংঘ - XOBDO
Amit,
Its is nice talking to you. From our conversation what I understood is
that
you already have a "word list" of around 4000 Assamese words encoded
in
UNICODE that you have extracted from various sources. Now, you are
looking
for additional words from XOBDO to boost your spell-checker's ability.
And,
your spell checker can be downloaded free to be used by any other
developer.
Please clarify if I am not correct.


Here is what I propose:
1. We supplies you a list of 5000 words (only the words; no meaning,
PoS
etc).
2. You prepare the spell-check with 5000 XOBDO's word+ your 4000+
words.
Here, we will attempt to update XOBDO's database with the words
collected by
you.
3. Once ready, we upload the spell-checker in XOBDO's website as well
as
your website for selected people to do download and test. Here, we
will
check how XOBDO's credit is presented.
4. After successful testing and satisfaction, we supply additional
10000
words. By now, you will have 15000 words (75%) of XOBDO's database.
5. Once you upgrade your spell-checker with all the words supplied
from us
and collected by yourself, we re-upload it again in both websites.
This
time, we release it for a bigger audience.
6. We repreat the process at regular intervals, say every 6 months or
so. At
any point of time, we will supply only 75% of XOBDO's words.


Please post your opinion.
Thanks,
Bikram

বিক্ৰম

unread,
Feb 8, 2009, 12:17:21 AM2/8/09
to শব্দ সংঘ - XOBDO
---email received from Amit---
Hello,

Seems fine with me. You are right in the freely downloadable stuff.
The
additional beauty of it being Open Source is that any one, not only
a
developer, but anyone can take part in it and keep the updates
happening.


1. Supply a list - Agreed.


2. Updating XOBDO's database - I really doubt that. I fear your
database
might already have the words that I have. :)


3. The spell checker is first packaged according to the guidelines
for
hunspell (http://hunspell.sourceforge.net/). The package will be
submitted
upstream by me. It will be in a tar ball format and so not only
XOBDO, but
if anyone else wants to download and package it (whatever packaging
mechanism they use) and publish, they can. There are no selected
people, its
for all.


But, you all need to tell me how I can put XOBDO's credit here.


1st issue :
------------
There might be common words between the ones I have and the ones you
give
me. How do I determine which one is whose ? My script simply takes
words
that appear in a file and sorts them and removes the duplicates.


2nd:
-----
I am just looking for a list of words. If the words you provide are
unique,
then only the question of giving credits come. For example, the word
চৰাই.
Whom can you give the credit for using it daily ?


The point is I am not asking for the "present working structure" of
XOBDO.
I am just asking for a list of words. Probably in plain text.
Preferably in
Unicode encoding.
If I was making a dictionary,like the ones you have, or like this one
here
http://www.indlinux.org/wiki/index.php/Hindi_dict , and I had
requested
for the algorithm, database implementation etc. then certainly the
question
of licensing etc. come up.


I hope I am able to clarify my point.


4, 5 and 6. Regular update - I am all ok with that.


I would like to emphasize here once more, by the phrase "Using
XOBDO's
database", I mean using the "list of Assamese words" that XOBDO has
as of
now. When I decide to make a full fledged open source dictionary
(which
includes an in-built spell checker, synonym-antonym suggestion,
exlanation,
english-assamese conversion) for Assamese, I will submit a different
proporsal. For the time being, its only a list of words.


Best regards,
Amit.
Reply all
Reply to author
Forward
0 new messages