FW: Interview Article URL

18 views
Skip to first unread message

Luan Vannithone

unread,
Aug 15, 2012, 8:40:01 PM8/15/12
to LaoEnGT, Cameron Darke, Sunny's friend - workLiveLaos
Hi all team members,
 
You may be interested to read (and later discuss) the Lao version of Interview questions and answers about our Google online dictionary and translation project effort that Cameron Darke has posted on his WorkLiveLaos website. See ...
http://www.worklivelaos.com/google-translate-lao/
 
I'll post work options shortly for discussion to help progress the project further.
 
Hakphaeng
 
Luang Vannithone
 

From: luan_va...@hotmail.com
To: cam...@worklivelaos.com
Subject: RE: Interview Article URL
Date: Wed, 15 Aug 2012 18:50:52 +1000

Cameron,
 
Thankyou very much for raising awareness and let other people in Laos, your website visitors know about our effort.
I will increase my effort in the project, try getting more involvement from others, and push the project as far as we can.   
 
Cheers,
 
Luang
 
> Subject: Interview Article URL
> From: cam...@worklivelaos.com
> Date: Wed, 15 Aug 2012 00:15:59 +1000
> To: luan_va...@hotmail.com
>
> Hi Luan,
>
> Once again thank you very much for your time. I hope my article will give your project exposure and maybe someone will volunteer. You can read it here:
>
> http://www.worklivelaos.com/google-translate-lao/
>
> Cheers,
> Cameron

Houmphanh

unread,
Aug 16, 2012, 7:20:15 AM8/16/12
to laol10n...@googlegroups.com
Sabaydii Luang et all,
It is probably timely for us to assess we are at. Basically we are at a stage to build up a digital Lao corpus, starting with En-La. A En-La dictionary do exist in many forms; but if we have them available in digital formats, it is an easy task to come up with a program to digest the various forms of data and customize filter out the output. The increase of Lao content e.g. Lao websites, help to augment the volume of such data also. We can then collate the data and correlate it with other language dictionaries to derive other Lao language pairs e.g. if we have En-la, FR-en then we can derive FR-La. Vincent from LaoSoftware is also slowly moving forward with his corpus. So lets keep soldiering on :)

The Thai researchers are also working on their solution, even though Google Thai works; a better system is being developed. Ok, why does Google Thai works and not Lao? many people ask!

Simple, the Google system has no linguistic knowledge; it is a statistical translation machine, relying on a brute force method, with lots of data to refer to, and past usage ie. digital data, to interpret and guess a best often use of words in some known context  (base on occurrences and use of different words= statistics). Google Lao does not work, simply; we have dictionaries i.e even if we pay someone to key in a selected EN-La dictionary today! Google -La will still not work. It will contribute but it will not still not be functional. Or it will be only a translation of word for word, with sometimes funny context meanings.

For a decent Lao Machine Translation (LMT) to work:
- Google-La will needs to built up a Lao words statistics.
- Develop a hint dictionary, with context relevant use of words.
- Hint, to include variations of Lao words and spellings use. A root words/radical dictionary.

Some may have seen my LMT working, which is an extension of Google Thai, with a small TH-La dictionary, and built in rules and exceptions to handle Th-la idioms and variations of words. It was originally conceived to facilitate the creation of a digital En-La corpus/dictionary. In fact it did help, where we quickly derive some 25,000 words and collate it with STEA En-La dictionary, to filter out a useful set. One application is NUOL Wordnet (http://en.wikipedia.org/wiki/WordNet) dictionary. By the way, the hope is also to derive other dictionary e.g JP-En-TH, Th-La => JP-La etc.

Best regards

Pan

darast...@aol.com

unread,
Aug 17, 2012, 11:20:24 AM8/17/12
to laol10n...@googlegroups.com
Luan, very nice interview!
 
My question is: where is the page or link where one can scan in English and Lao?
 
This was one solution discussed previously. Thank you, Dara

Luan Vannithone

unread,
Aug 18, 2012, 6:56:33 PM8/18/12
to LaoEnGT
Thanks for your valuable thought/comment,  Pan. See my reply in RED.

 
Simple, the Google system has no linguistic knowledge; it is a statistical translation machine, relying on a brute force method, with lots of data to refer to, and past usage ie. digital data, to interpret and guess a best often use of words in some known context (base on occurrences and use of different words= statistics). Google Lao does not work, simply; we have dictionaries i.e even if we pay someone to key in a selected EN-La dictionary today! Google -La will still not work. It will contribute but it will not still not be functional. Or it will be only a translation of word for word, with sometimes funny context meanings.
 
[Pan,  You seem to be suggesting even if we have build up ' lots of data to refer to, and past usage ie. digital data, to interpret and guess a best often use of words in some known context (base on occurrences and use of different words= statistics)', similar to what the Thai have done, we still won't get similar result to what Google Translate has done for Thai language. Really?
If that is the case we are wasting our time at this Google place, then.
Is there any other place we can spend our time and effort for advancing the Lao language?
I recently heard our Hmong brothers have score a success with Microsoft Translation platform. See previous post. I cannot assess its quality since I don't read Hmong. I wonder if anyone has investigate it for Lao language. ].


For a decent Lao Machine Translation (LMT) to work:
- Google-La will needs to built up a Lao words statistics.
- Develop a hint dictionary, with context relevant use of words.
- Hint, to include variations of Lao words and spellings use. A root words/radical dictionary.
 
[Is there any development in this direction for Lao? any link?]


Some may have seen my LMT working, which is an extension of Google Thai, with a small TH-La dictionary, and built in rules and exceptions to handle Th-la idioms and variations of words. It was originally conceived to facilitate the creation of a digital En-La corpus/dictionary. In fact it did help, where we quickly derive some 25,000 words and collate it with STEA En-La dictionary, to filter out a useful set. One application is NUOL Wordnet (http://en.wikipedia.org/wiki/WordNet) dictionary. By the way, the hope is also to derive other dictionary e.g JP-En-TH, Th-La => JP-La etc.
 
[Pan,  I search the net and find this, ...
http://lo.asianwordnet.org/ . I think this is what you are referring to. Is that right?
I will have a good look at it.] 

Sokdii all
 
Luang
 

Date: Thu, 16 Aug 2012 18:20:15 +0700

Subject: [LaoEnGT] Interview Article URL
From: p...@Nuvos.Biz
To: laol10n...@googlegroups.com

Houmphanh

unread,
Aug 19, 2012, 1:55:40 AM8/19/12
to laol10n...@googlegroups.com
Sabaydii Luang et all,
My answer in Blue below.

On Sun, Aug 19, 2012 at 5:56 AM, Luan Vannithone <luan_va...@hotmail.com> wrote:
Thanks for your valuable thought/comment,  Pan. See my reply in RED.

 
Simple, the Google system has no linguistic knowledge; it is a statistical translation machine, relying on a brute force method, with lots of data to refer to, and past usage ie. digital data, to interpret and guess a best often use of words in some known context (base on occurrences and use of different words= statistics). Google Lao does not work, simply; we have dictionaries i.e even if we pay someone to key in a selected EN-La dictionary today! Google -La will still not work. It will contribute but it will not still not be functional. Or it will be only a translation of word for word, with sometimes funny context meanings.
 
[Pan,  You seem to be suggesting even if we have build up ' lots of data to refer to, and past usage ie. digital data, to interpret and guess a best often use of words in some known context (base on occurrences and use of different words= statistics)', similar to what the Thai have done, we still won't get similar result to what Google Translate has done for Thai language. Really?
That is right, not what Google have done for Thai, the Thais did it for themselves having more resources such as NECTEC that seems to be more forward looking than Laos STEA/NAST/now NAPT. 
If that is the case we are wasting our time at this Google place, then.
Is there any other place we can spend our time and effort for advancing the Lao language?
I did not mean to discourage anyone, but realign the focus. What is being done is great, the task of collating Lao corpus is one of the 1st step. Ok, a computer is a pretty dumb machine :) let's say we have a completed En-La dictionary; let's say how will it translate? look at word pairs, select a one to one entry. What if there are 1-many or many (variations) to 1; which one will get selected. Google MT is non-linguistic aware, how will apply grammar alike corrections? This is where hints and in the case of Google MT, statistics come in. Given past collected word data frequencies use in a text;  combination of word occurrences in certain context (known subject fields) are given score or weight or rating; therefore a text is presented to be translated, an approach is analyze the text, work out it weight (in relation to freq. words used), having a hint of context, the MT can then intuitively select a 'closer' word as the translated word. Also Google tends to match the longest string preference, ie. meaning more exact, so the more data they have the better is the translation.
Google is crawling through Lao text and harvesting Lao words used as we speak; SO the more the group does, in terms of output it will all contribute to a workable system.


I recently heard our Hmong brothers have score a success with Microsoft Translation platform. See previous post. I cannot assess its quality since I don't read Hmong. I wonder if anyone has investigate it for Lao language. ].
 

For a decent Lao Machine Translation (LMT) to work:
- Google-La will needs to built up a Lao words statistics.
- Develop a hint dictionary, with context relevant use of words.
- Hint, to include variations of Lao words and spellings use. A root words/radical dictionary.
 
[Is there any development in this direction for Lao? any link?]

No, just I am aware of the work by word of mouth.

Some may have seen my LMT working, which is an extension of Google Thai, with a small TH-La dictionary, and built in rules and exceptions to handle Th-la idioms and variations of words. It was originally conceived to facilitate the creation of a digital En-La corpus/dictionary. In fact it did help, where we quickly derive some 25,000 words and collate it with STEA En-La dictionary, to filter out a useful set. One application is NUOL Wordnet (http://en.wikipedia.org/wiki/WordNet) dictionary. By the way, the hope is also to derive other dictionary e.g JP-En-TH, Th-La => JP-La etc.
 
[Pan,  I search the net and find this, ...
http://lo.asianwordnet.org/ . I think this is what you are referring to. Is that right?
I will have a good look at it.] 

No, NUOL application does not mention Wordnet, I had a look at the package and I can see it is an implementation of Wordnet using data from STEA. One of NECTEC teams, are working big on Wordnet for Thai, they will deploy then in schools. Please note Wordnet has a concept of hints: spellings, variations, context etc..

STEA/NAST has now been reorganized into NAPT = National Authority of Post and Telecommunication Ministry, although many physical offices remain as they were before. MT is at NAPT very low priority.
 
Sookdii


Pan

Luan Vannithone

unread,
Aug 20, 2012, 6:46:46 PM8/20/12
to LaoEnGT

 Sabaidii Pan and all,
 
My further reply and thinking are in GREEN below.
 
Hakphaeng
 
Luang

Date: Sun, 19 Aug 2012 12:55:40 +0700
Subject: Re: [LaoEnGT] Interview Article URL

From: p...@Nuvos.Biz
To: laol10n...@googlegroups.com

Sabaydii Luang et all,
My answer in Blue below.

On Sun, Aug 19, 2012 at 5:56 AM, Luan Vannithone <luan_va...@hotmail.com> wrote:
Thanks for your valuable thought/comment,  Pan. See my reply in RED.

 
Simple, the Google system has no linguistic knowledge; it is a statistical translation machine, relying on a brute force method, with lots of data to refer to, and past usage ie. digital data, to interpret and guess a best often use of words in some known context (base on occurrences and use of different words= statistics). Google Lao does not work, simply; we have dictionaries i.e even if we pay someone to key in a selected EN-La dictionary today! Google -La will still not work. It will contribute but it will not still not be functional. Or it will be only a translation of word for word, with sometimes funny context meanings.
 
[Pan,  You seem to be suggesting even if we have build up ' lots of data to refer to, and past usage ie. digital data, to interpret and guess a best often use of words in some known context (base on occurrences and use of different words= statistics)', similar to what the Thai have done, we still won't get similar result to what Google Translate has done for Thai language. Really?
That is right, not what Google have done for Thai, the Thais did it for themselves having more resources such as NECTEC that seems to be more forward looking than Laos STEA/NAST/now NAPT. 
 
I guess It would be extremely hard to be 'forward looking' when Lao people were busy fleeing the war and conflict pretty much continuously since the Siam expansion and the advent of European colonisation. For us, today, what is the best and effective way to develop Lao language resources? 

If that is the case we are wasting our time at this Google place, then.
Is there any other place we can spend our time and effort for advancing the Lao language?
I did not mean to discourage anyone, but realign the focus. What is being done is great, the task of collating Lao corpus is one of the 1st step. Ok, a computer is a pretty dumb machine :) let's say we have a completed En-La dictionary; let's say how will it translate? look at word pairs, select a one to one entry. What if there are 1-many or many (variations) to 1; which one will get selected. Google MT is non-linguistic aware, how will apply grammar alike corrections? This is where hints and in the case of Google MT, statistics come in. Given past collected word data frequencies use in a text;  combination of word occurrences in certain context (known subject fields) are given score or weight or rating; therefore a text is presented to be translated, an approach is analyze the text, work out it weight (in relation to freq. words used), having a hint of context, the MT can then intuitively select a 'closer' word as the translated word. Also Google tends to match the longest string preference, ie. meaning more exact, so the more data they have the better is the translation.
Google is crawling through Lao text and harvesting Lao words used as we speak; SO the more the group does, in terms of output it will all contribute to a workable system.

So, you do think what we are doing is worthwhile, and not a waste of time. What do you mean by '...but realign the focus'. What do you like to see us doing exactly?
I recently heard our Hmong brothers have score a success with Microsoft Translation platform. See previous post. I cannot assess its quality since I don't read Hmong. I wonder if anyone has investigate it for Lao language. ].
 

For a decent Lao Machine Translation (LMT) to work:
- Google-La will needs to built up a Lao words statistics.
- Develop a hint dictionary, with context relevant use of words.
- Hint, to include variations of Lao words and spellings use. A root words/radical dictionary.
 
[Is there any development in this direction for Lao? any link?]

No, just I am aware of the work by word of mouth.
 
[If you, by any chance, come across some from of contact or link, please let me know.]

Some may have seen my LMT working, which is an extension of Google Thai, with a small TH-La dictionary, and built in rules and exceptions to handle Th-la idioms and variations of words. It was originally conceived to facilitate the creation of a digital En-La corpus/dictionary. In fact it did help, where we quickly derive some 25,000 words and collate it with STEA En-La dictionary, to filter out a useful set. One application is NUOL Wordnet (http://en.wikipedia.org/wiki/WordNet) dictionary. By the way, the hope is also to derive other dictionary e.g JP-En-TH, Th-La => JP-La etc.
 
[Pan,  I search the net and find this, ...
http://lo.asianwordnet.org/ . I think this is what you are referring to. Is that right?
I will have a good look at it.] 

No, NUOL application does not mention Wordnet, I had a look at the package and I can see it is an implementation of Wordnet using data from STEA. One of NECTEC teams, are working big on Wordnet for Thai, they will deploy then in schools. Please note Wordnet has a concept of hints: spellings, variations, context etc..

STEA/NAST has now been reorganized into NAPT = National Authority of Post and Telecommunication Ministry, although many physical offices remain as they were before. MT is at NAPT very low priority.
 
[If you, by any chance, come across some from of contact or link, please let me know.]

 
Sookdii


Pan

Luan Vannithone

unread,
Sep 11, 2012, 8:56:34 PM9/11/12
to LaoEnGT
Hi Dara and all,
 
Sorry to take a while to reply to your query, Dara.
My computer hard disk crashed. Crying face Maybe I over use it. Dog face
I bought a new disk and have restored the system and most data.
 
Anyway back to the topic.
Others may wonder what is Dara on about?
To recap ...
A few months ago I said OK to Dara suggestion that we/I create a system to facilitate people to do 'scanning English-Lao texts' into Google Translator Tookit (GTT), thus creating Global Translation Memories or TM units as results. 
 
Here's the page ...
https://sites.google.com/site/eng2lao/eng-lao-text-contents-translation-memories-tm
Let me know what you think? way to improve?
 
BTW ... a good news! 
I have made contact with GOOGLE INSIDER, Divon Lan, Google Translate product manager for emerging countries, including Laos.
This comes about after he spotted and read the Q and A interview I gave to WorkLiveLaos website. See below for info. http://www.worklivelaos.com/google-translate-lao/
I had a SKYPE chat with him and he is happy to support us with technical and other advices.
 
He also said GOOGLE is currently using a program to 'crawl' thru websites known to contain English-Lao translations, process them and store results in system as TM units. (Ahh... that term TM, again!) This will be much faster than doing manually but the results won't be as good. 
He said what we are envisaging doing as explained in web paged referenced above would provide a better quality supplement to what Google themselve are doing internally.
He also said he will investigate the 'dictionary' feature of GTT (as added resource atop of Glossary file) for us. He clarified Dictionary is more generic and has much more words and terms. Glossary is smaller and more specific to a given subject matter (or technical area).
 
Stay tuned.
 
BTW, has anyone investigate Microsoft Bing translation platform for Lao luanguage?
 
Luang
 
 

To: laol10n...@googlegroups.com
Subject: [LaoEnGT] Question: where is the page where
From: darast...@aol.com
Date: Fri, 17 Aug 2012 11:20:24 -0400

Houmphanh

unread,
Sep 11, 2012, 10:33:36 PM9/11/12
to laol10n...@googlegroups.com
Sabaydii Luang,
And I thought we have gone on holidays :) . Re. Scan, you don't mean OCR? English is ok, Lao impossible! although NAST with UN funding and NECTEC did do an experimental  system based on OCROPUS (google it).
Nectec has their own system for Thai, paid by the page service, but nothing working as a commercial product as yet. Note storage by scanned will take maybe 10 times more space.

Divon Lan, good contact! his face looks familiar, maybe crossing at conferences. It looks like he has been to Laos, and he has a Cambodian wife.

Sookdii

Pan
PS. VTE is hot & humid


On Wed, Sep 12, 2012 at 7:56 AM, Luan Vannithone <luan_va...@hotmail.com> wrote:
Hi Dara and all,
 
Sorry to take a while to reply to your query, Dara.
My computer hard disk crashed. Crying face Maybe I over use it. Dog face

Dara Stieglitz

unread,
Sep 12, 2012, 3:37:28 PM9/12/12
to laol10n...@googlegroups.com
Thank you for great news!!!! Dara

Dara Stieglitz

Life is like art, create what you desire.~ Dara Stieglitz
Reply all
Reply to author
Forward
0 new messages