Weird glosses for 穴だらけ (vs. other 〜だらけ)

Adam Nohejl

unread,

Jul 7, 2021, 6:59:46 AM7/7/21

to edict-...@googlegroups.com

Hello,

I found a definition, which is definitely off, but it has been changed this way from one, which was better in my opinion. So I'm writing to the list instead of edition.

Currently 穴だらけ has the following glosses:

hole, aperture, opening, orifice
fault, flaw

For some reason, the glosses have been changed from adjective/attributive use (e.g. "being full of holes") to simple nouns, which doesn't reflect usage (could not find any occurrence as subject in BCCWJ), and more importantly meaning (だらけ means that there are many holes/flaws/..., that something is full of holes, and glosses like "orifice" and "aperture" are just misleading).

Additionally it doesn't correspond to other 〜だらけ entries:

皺だらけ wrinkled
毛だらけ hairy; furry
茨だらけ 1. thorny 2. miserable
血だらけ bloodstained; bloody; gory
泥だらけ covered in mud; mud-caked
虱だらけ lousy; lice-ridden; covered in lice

I would therefore suggest the following glosses for 穴だらけ

full of holes; porous; holey; leaky
full of flaws; full of loopholes

—or something along these lines (perhaps native English speakers will think of better ones). These are similar to what one can find in Eijiro:

https://eow.alc.co.jp/search?q=穴だらけ

I also have one more suggestion. Currently the 〜だらけ entries have several different sets of POS tags: most commonly (adj-no), but also (n,adj-no), (adj-no,adj-na) and (adj-no,adj-na,n).

I think they should be all (adj-no), because:

They are not full-fledged nouns (no occurrence of だらけ as a subject, i.e. followed by particle が in BCCWJ).
Sometimes だらけ is used as a na-adjective, but it's rather marginal. These are numbeers from BCCWJ:
- 8 occurrences followed by な (not counting なの、なん)
- 720 occurrences followed by の
- vast majority of the rest followed by a copula or に
- 1,957 total occurrences

I also could not find any use as a na-adjective in a dictionary.

Sorry for a long-winded post, hope it improves the dictionary

--
Adam Nohejl

Jim Breen

unread,

Jul 8, 2021, 7:56:45 AM7/8/21

to edict-...@googlegroups.com

Thanks. I've proposed amendments based on your comments. See:
https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1856920
You can make such proposals directly via that interface.

I'll look at the POSs for the other 〜だらけ entries.

Jim

> --
> You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/5C438C1E-FB58-4B9B-BB66-612F4C73DAE8%40nohejl.name.

--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/

Adam Nohejl

unread,

Jul 8, 2021, 9:46:16 AM7/8/21

to edict-...@googlegroups.com

Hi Jim,

Thank you. OK, I will use the interface next time.

I have noticed that you are refering to some numbers of occurrences in
the proposal. What corpus are they from? Is there a preferred corpus to
use as a reference for editing/submitting entries?

Best regards,

--
Adam Nohejl

> https://groups.google.com/d/msgid/edict-jmdict/CABHGxq6u3qqCX-mxDG3cJn%2Brxz41Kttu4%3DP852wSVRZUB47Tow%40mail.gmail.com.

Ben Bullock

unread,

Jul 8, 2021, 9:51:44 AM7/8/21

to edict-...@googlegroups.com

On Thu, 8 Jul 2021 at 20:56, Jim Breen <jimb...@gmail.com> wrote:

Thanks. I've proposed amendments based on your comments. See:
https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1856920
You can make such proposals directly via that interface.

I'll look at the POSs for the other 〜だらけ entries.

There was a discussion on the sci.lang.japan newsgroup a few years ago about this topic including contributions by muchan:

https://groups.google.com/g/sci.lang.japan/c/XErMtAHPgqo/m/nJq_1bLp2DEJ

The rest of the discussion is just bores nattering though.

Jim Breen

unread,

Jul 9, 2021, 4:04:03 AM7/9/21

to edict-...@googlegroups.com

On Thu, 8 Jul 2021 at 23:46, Adam Nohejl wrote:
> I have noticed that you are refering to some numbers of occurrences in
> the proposal. What corpus are they from? Is there a preferred corpus to
> use as a reference for editing/submitting entries?

They are coming from the 2007 Google WWW n-gram corpus. The current
pages for looking up the counts are:
https://www.edrdg.org/~jwb/ngramcounts.html
http://nlp.cis.unimelb.edu.au/jwb/ngramcounts.html

The counts have been derived by aggregating the raw n-gram sequences,
then sorting them. For example, the count:
体操座りしていた 31
would have derived for the 6-gram: <体操座りしていた> which occurred 31
times in Google's trawl of WWW pages.

I should write a page explaining this a bit more. The data file with
all these counts is about 36Gb.

Jim

Jim Breen

unread,

Jul 9, 2021, 4:10:50 AM7/9/21

to edict-...@googlegroups.com

A few years ago! That was 2004! I was only in my early middle age then.

And those blast-from-the-past names: muchan, Mike Cash, Kevin Gowan,
... I'll have to open another bottle of red or I'll start weeping.

Jim

> --
> You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/CAN5Y6m9%2B4%2BHMjYioaMiU-n%2BS%3D%3DdarvPA1MU%3DMHDbVbgELEyB%2BA%40mail.gmail.com.

Reply all

Reply to author

Forward