N-gram counts: common inflections

20 views
Skip to first unread message

Adam Nohejl

unread,
Sep 15, 2023, 2:38:14 AM9/15/23
to edict-...@googlegroups.com
First, thank you so much for making Japanese n-gram search available -
it's a great resource!

I've been wondering though what's the rationale for "common verb
inflections"?

Clearly, many common inflections/suffixes (causative, "-ō/-yō"
volitional, "-(i)tai" desiderative, or just the simple adverbal
form/連用形) are omitted, while some rather less common
("-(i)masen'nara") are included.

I would find it very useful to be able to search for the complete
inflections (in the narrow sense, i.e. forms of the first token) of
pentagrade (五段) verbs, e.g.
言わ・言い・言っ・言う・言え・言お, without necessarily
adding suffixes to them (particles, auxiliary verbs etc.). This way one
could easily count occurrences of all forms of a verb.

--
Adam Nohejl

Jim Breen

unread,
Sep 15, 2023, 4:06:41 AM9/15/23
to edict-...@googlegroups.com
On Fri, 15 Sept 2023 at 16:38, Adam Nohejl <ad...@nohejl.name> wrote:
> I've been wondering though what's the rationale for "common verb
> inflections"?

The code driving that option was lifted from the verb inflection table
code in the WWWJDIC server. If you compare the equivalent pages:
https://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1W%C6%C9%A4%E0_v5m
https://www.edrdg.org/~jwb/cgi-bin/ngramlookup?sent=%E8%AA%AD%E3%82%80&vinfl=on
you'll see the similarity.

The tables in WWWJDIC were based on tables in a couple of textbooks,
and several language teachers assisted with suggestions. It's quite
old code now.

> Clearly, many common inflections/suffixes (causative, "-ō/-yō"
> volitional, "-(i)tai" desiderative, or just the simple adverbal
> form/連用形) are omitted, while some rather less common
> ("-(i)masen'nara") are included.

なるほど/

> I would find it very useful to be able to search for the complete
> inflections (in the narrow sense, i.e. forms of the first token) of
> pentagrade (五段) verbs, e.g.
> 言わ・言い・言っ・言う・言え・言お, without necessarily
> adding suffixes to them (particles, auxiliary verbs etc.). This way one
> could easily count occurrences of all forms of a verb.

While some tweaking is possible, I doubt that I have the time or
energy to tackle a major revision of that function.
You could script some calls to the server, e.g.
https://www.edrdg.org/~jwb/cgi-bin/ngramlookup?sent=%E8%A8%80%E3%82%8F&topjuku=on&top100=on
and aggregate the results.

HTH

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/

Adam Nohejl

unread,
Sep 15, 2023, 5:05:34 AM9/15/23
to edict-...@googlegroups.com
Hi Jim,

> The tables in WWWJDIC were based on tables in a couple of textbooks,
> and several language teachers assisted with suggestions. It's quite
> old code now.

I see.

> While some tweaking is possible, I doubt that I have the time or
> energy to tackle a major revision of that function.

No problem.

> You could script some calls to the server, e.g.
> https://www.edrdg.org/~jwb/cgi-bin/ngramlookup?sent=%E8%A8%80%E3%82%8F&topjuku=on&top100=on
> and aggregate the results.

Now let's see if I find the time and energy:-D

--
Adam Nohejl
Reply all
Reply to author
Forward
0 new messages