Use of jlpt data

77 views
Skip to first unread message

webm...@japandict.com

unread,
Apr 23, 2016, 3:55:19 AM4/23/16
to jmdic...@googlegroups.com
Hi,

I'm the owner of the japandict.com Japanese dictionary.

While I was trying to improve the search in the dictionary i discovered this translation project and it looks great! I think it can be very useful to all non native English speakers.

I also saw in your Github repository the files jlpt-n[1-5].csv containing the JMDict id's for all the entries in each one of the JLPT levels. Actually, this is something I've been looking for some time now. Did you compile those lists or did you take them from somewhere else?

As far as I see, this data is CC BY-SA 3.0 (please, correct me if I'm wrong). Do you mind if I use this data in japandict.com adding the appropriate reference in the "Attributions" page?  

Thank you

Carles

Alexandre Courbot

unread,
Apr 23, 2016, 8:32:44 PM4/23/16
to jmdic...@googlegroups.com
Hi,

On Sat, Apr 23, 2016 at 4:55 PM, <webm...@japandict.com> wrote:
> Hi,
>
> I'm the owner of the japandict.com Japanese dictionary.
>
> While I was trying to improve the search in the dictionary i discovered this
> translation project and it looks great! I think it can be very useful to all
> non native English speakers.

Merging the data with the JMdict data is rather trivial, so feel free
to do it for your project! We want to have it merged back into the
JMdict eventually, but this is not happening yet. Note that a sister
project also exists for Kanjidic2.

> I also saw in your Github repository the files jlpt-n[1-5].csv containing
> the JMDict id's for all the entries in each one of the JLPT levels.
> Actually, this is something I've been looking for some time now. Did you
> compile those lists or did you take them from somewhere else?

Both. They originate from Tagaini Jisho, and here is the license text
regarding them:

JLPT levels for words come from the [JLPT Study
Page](http://www.jlptstudy.com/), the [JLPT Resource
Page](http://www.tanos.co.uk/jlpt/), as well as lists provided by [Thierry
Bézecourt](http://www.thbz.org/kanjimots/jlpt.php3) and [Alain
Côté](http://jetsdencredujapon.blogspot.com).

Note that since there is no official list of JLPT words, this data is
only for informational purposes and should not be taken too seriously.
There are probably mistakes remaining.

Also if you intent to use it I suggest you take it from Tagaini (see
in https://github.com/Gnurou/tagainijisho/tree/master/src/core/jmdict
) since I suspect the list is more up-to-date there. Looking at your
credits page it seems like we got our data from approximately the same
source. ;)

> As far as I see, this data is CC BY-SA 3.0 (please, correct me if I'm
> wrong). Do you mind if I use this data in japandict.com adding the
> appropriate reference in the "Attributions" page?

You are correct about the license. If you take it from Tagaini then
Tagaini is the project to credit (since it was initially collected for
it and got used in jmdict-i18n to better organize the data). You are
actually encouraged to use this data especially since your project
seems to be non-commercial.

Cheers,
Alex.

webm...@japandict.com

unread,
Apr 24, 2016, 6:33:29 AM4/24/16
to jmdic...@googlegroups.com
April 24 2016 7:32 AM, "Alexandre Courbot" <gnu...@gmail.com> wrote:
> Merging the data with the JMdict data is rather trivial, so feel free
> to do it for your project! We want to have it merged back into the
> JMdict eventually, but this is not happening yet. Note that a sister
> project also exists for Kanjidic2.

Adding more languages has been always on my wish list, but lack of time and the fact that until now there weren't many translations made me to pospone it. I'll reconsider it now!



> Also if you intent to use it I suggest you take it from Tagaini (see
> in https://github.com/Gnurou/tagainijisho/tree/master/src/core/jmdict
> ) since I suspect the list is more up-to-date there. Looking at your
> credits page it seems like we got our data from approximately the same
> source. ;)

My scripts to associate the words in the jlpt lists with the jmdict entries had so many mistakes and inaccuracies even if we had the same sources... Apparently you did much better job than me associating them :)

> You are correct about the license. If you take it from Tagaini then
> Tagaini is the project to credit (since it was initially collected for
> it and got used in jmdict-i18n to better organize the data). You are
> actually encouraged to use this data especially since your project
> seems to be non-commercial.

Cool! I've now updated my scripts to use the files from the Tagaini Jisho project and the attributions page accordingly.

Thanks!
Carles

Reply all
Reply to author
Forward
0 new messages