EDICT version?

12 views
Skip to first unread message

JimBreen

unread,
Mar 23, 2010, 12:39:00 AM3/23/10
to moji
How up-to-date is the EDICT version used by Moji? As people
probably know, the EDICT file is being corrected/expanded
all the time, and a fresh version is issued each day.

Also, as people who used the WWWJDIC server will know, the
"Translate Words" option uses a combined EDICT-format file
which contains all the proper nouns (from the ENAMDICT
file) plus data from a lot of other glossaries. It has over 1M
entries instead of the 176k in EDICT (most of the extra ones
are names.) Is there any chance of this file being used as an
alternative in Moji?

Jim Breen

Philip Chee

unread,
Mar 23, 2010, 5:38:16 AM3/23/10
to mo...@googlegroups.com
On 23 March 2010 12:39, JimBreen <jimb...@gmail.com> wrote:

> Also, as people who used the WWWJDIC server will know, the
> "Translate Words" option uses a combined EDICT-format file
> which contains all the proper nouns (from the ENAMDICT
> file) plus data from a lot of other glossaries. It has over 1M
> entries instead of the 176k in EDICT (most of the extra ones
> are names.) Is there any chance of this file being used as an
> alternative in Moji?

The ENAME dict as well as several other dictionaries are
available from the installation section:
<http://moji.mozdev.org/index.html#install>

As far as I can tell the following are available:
Japanese places mojijplaces-0.6.20071013.xpi (1016.8 KB)
Japanese names mojijpnam-0.6.20071013.xpi (7.5 MB)

The dictionaries all appear to date from 2007 or 2008.

Phil

Gerald Vogt

unread,
Mar 23, 2010, 8:29:27 AM3/23/10
to moji
Hi Jim,

On Mar 23, 5:39 am, JimBreen <jimbr...@gmail.com> wrote:
> How up-to-date is the EDICT version used by Moji? As people
> probably know, the EDICT file is being corrected/expanded
> all the time, and a fresh version is issued each day.

The current English dictionary is March 2008. Names and places is
October 2007.

I'll have to do some bigger changes in moji to get it accepted on AMO
again. These changes require an update to the dictionaries as well. In
that process I'll update the edict sources to the latest available
again.

> Also, as people who used the WWWJDIC server will know, the
> "Translate Words" option uses a combined EDICT-format file
> which contains all the proper nouns (from the ENAMDICT
> file) plus data from a lot of other glossaries. It has over 1M
> entries instead of the 176k in EDICT (most of the extra ones
> are names.) Is there any chance of this file being used as an
> alternative in Moji?

I'll have to review the contents of the combined file. Generally, moji
does not require combined edict files. On the contrary, I prefer
separated edict files because you can install any set of dictionaries
you want. A combined dictionary would thus be more for convenience
than database. If you have some suggestions of very useful and good
additions for glossaries or similar I can assemble those as new moji
dictionaries.

Cheers,

Gerald

JimBreen

unread,
Mar 23, 2010, 5:49:01 PM3/23/10
to moji
On Mar 23, 11:29 pm, Gerald Vogt <v...@spamcop.net> wrote:
> I'll have to do some bigger changes in moji to get it accepted on AMO
> again. These changes require an update to the dictionaries as well. In
> that process I'll update the edict sources to the latest available
> again.

Thanks. It would be good if updates could be created more regularly.
Is the format easy to generate? One option is that I could generate
the dictionary files as part of my regular distribution process.

> > Also, as people who used the WWWJDIC server will know, the
> > "Translate Words" option uses a combined EDICT-format file
> > which contains all the proper nouns (from the ENAMDICT
> > file) plus data from a lot of other glossaries. It has over 1M
> > entries instead of the 176k in EDICT (most of the extra ones
> > are names.) Is there any chance of this file being used as an
> > alternative in Moji?
>
> I'll have to review the contents of the combined file. Generally, moji
> does not require combined edict files. On the contrary, I prefer
> separated edict files because you can install any set of dictionaries
> you want. A combined dictionary would thus be more for convenience
> than database. If you have some suggestions of very useful and good
> additions for glossaries or similar I can assemble those as new moji
> dictionaries.

The files included are listed at:
http://www.csse.monash.edu.au/~jwb/wwwjdicinf.html#dicfil_tag
I think they are all useful, but the larger ones (life sciences, law,
etc.)
are particularly good.

Cheers

Jim

Gerald Vogt

unread,
Mar 24, 2010, 1:04:43 PM3/24/10
to mo...@googlegroups.com
On 23.03.10 22:49, JimBreen wrote:
> On Mar 23, 11:29 pm, Gerald Vogt<v...@spamcop.net> wrote:
>> I'll have to do some bigger changes in moji to get it accepted on AMO
>> again. These changes require an update to the dictionaries as well. In
>> that process I'll update the edict sources to the latest available
>> again.
>
> Thanks. It would be good if updates could be created more regularly.
> Is the format easy to generate? One option is that I could generate
> the dictionary files as part of my regular distribution process.

The basic format is edict. I use some makefiles to generate the edict
files used in the moji, the versioning and the packaging. So basically,
everything goes automatically.

The edict files included in the moji dictionaries are UTF-8 encoded and
binary (LOCALE C) sorted. But that's all regarding the edict files.

The rest are the files required to make a proper xpi file out of
everything. It includes a JavaScript file which does some analysis of
the dictionary entries to do the formatting and some hints (e.g.
abbreviations are explained in tool tips.

The last thing is the versioning.

Thus, automation is not difficult. You could generate the full moji xpi
dictionaries and publish yourself together with an update.rdf file.

Else, I have to do it. Lately, I was quite busy and there were some
other problems with moji which required my attention and I forgot about
updating the edict files more often.

For the next release of moji on AMO it is necessary to make some larger
changes which also affect most dictionaries. That would be the next
"natural" occasion to update the dicts. I hope to get to it soon.

> The files included are listed at:
> http://www.csse.monash.edu.au/~jwb/wwwjdicinf.html#dicfil_tag
> I think they are all useful, but the larger ones (life sciences, law,
> etc.)
> are particularly good.

I guess we could package the combined dictionary as separated dictionary
to let people choose if they want the full thing or only some of it.

Cheers,

Gerald

JimBreen

unread,
Mar 31, 2010, 10:48:57 PM3/31/10
to moji
On Mar 25, 4:04 am, Gerald Vogt <v...@spamcop.net> wrote:
> On 23.03.10 22:49, JimBreen wrote:
> > Thanks. It would be good if updates could be created more regularly.
> > Is the format easy to generate? One option is that I could generate
> > the dictionary files as part of my regular distribution process.
>
> The basic format is edict. I use some makefiles to generate the edict
> files used in the moji, the versioning and the packaging. So basically,
> everything goes automatically.
>
> The edict files included in the moji dictionaries are UTF-8 encoded and
> binary (LOCALE C) sorted. But that's all regarding the edict files.

OK. That keeps it simple.

>
> The rest are the files required to make a proper xpi file out of
> everything. It includes a JavaScript file which does some analysis of
> the dictionary entries to do the formatting and some hints (e.g.
> abbreviations are explained in tool tips.
>
> The last thing is the versioning.
>
> Thus, automation is not difficult. You could generate the full moji xpi
> dictionaries and publish yourself together with an update.rdf file.
>
> Else, I have to do it. Lately, I was quite busy and there were some
> other problems with moji which required my attention and I forgot about
> updating the edict files more often.

Would you be able to send me the other files needed. I could have a
crack at generating the moji .xpi files.

> For the next release of moji on AMO it is necessary to make some larger
> changes which also affect most dictionaries. That would be the next
> "natural" occasion to update the dicts. I hope to get to it soon.

Maybe it would be better to wait for those changes.

> > The files included are listed at:
> >http://www.csse.monash.edu.au/~jwb/wwwjdicinf.html#dicfil_tag
> > I think they are all useful, but the larger ones (life sciences, law,
> > etc.)
> > are particularly good.
>
> I guess we could package the combined dictionary as separated dictionary
> to let people choose if they want the full thing or only some of it.

That sounds a good option.

Cheers

Jim

Reply all
Reply to author
Forward
0 new messages