Strange search results when selecting multiple tags

6 views
Skip to first unread message

freddie matthews

unread,
Jul 20, 2021, 10:05:48 AM7/20/21
to edict-...@googlegroups.com
Hi Stuart and co.,

I noticed when using the advanced search form on JMdictDB that checking "vs" along with another tag seems to give the OR (union) of the results rather than AND (intersection).

For example, checking "exp" and "vs" has both results. Another example is "adv" and "vs".

I tested some other combinations of tags and most work fine. Though the same behavior seems to manifest with "v1" and "vt". Same thing with "vi", "conj", and "prt" it seems.

Not the most pressing issue of course, and the database is otherwise splendidly performant. :)

– Opencooper

Stuart McGraw

unread,
Jul 20, 2021, 1:01:19 PM7/20/21
to edict-...@googlegroups.com
On 7/20/21 8:05 AM, freddie matthews wrote:
> I noticed when using the advanced search form on JMdictDB that checking "vs" along with another tag seems to give the OR (union) of the results rather than AND (intersection).

If the "another tag" is a tag in the same section ("Part-of-Speech" in this case), then yes, it is ORed with the "vs" tag. If it is from a different section then it is ANDed.

From the Notes section at the bottom of the search page:

* The text search rows and other sections are ANDed together.

* Within the attribute sections, all selections are ORed together. For example, if both "suf" and "pref" are checked in the PoS section, entries that are either suffixes or prefixes will be found.

This seemed to be the most useful behavior when the search page was first written. For example if one wants to limit results to only verbs one can click all the verb boxes. (Of course back then there were many fewer choices, eg, the V4* and v2* tags didn't exist.) The same consideration applies to some other sections: searching for "ichi1" OR "ichi2" words is useful but "ichi1" AND "ichi2" is not. In other cases it a toss up: one might wish to search for words of dialect "thb" OR "hob", or "thb" AND "hob" so for consistency with the other sections OR is used.

The part of speech tags are a little funky because tags having different characteristics are all bundled together. Some (eg "v5k" and "v5t") are mutually exclusive (making a search for them ANDed together not useful) but others (eg "v5k" and vi") aren't and are more usefully ANDed than ORed.

One possibility I can think of is add to each section, perhaps on the title line of the section, two boxes: OR, AND, with OR being checked by default, that would control how multiple selections in the section are added to the search criteria.

Another possibility is to add more structure to the Part-of-Speech search options, maybe subsections like "verb", "adjective", etc with each having the relevant subset of tags allowing the subsections to be ANDed/ORed in searches.

And then there is also the question of how to work NOT into this.

Counterbalancing all of this is making an already complex and maybe intimidating page even more so.

So the upshot is that I would love to hear any suggestions for improving things.

Thanks for raising the issue.

-- Stuart


On 7/20/21 8:05 AM, freddie matthews wrote:
> Hi Stuart and co.,
>
> I noticed when using the advanced search form on JMdictDB that checking "vs" along with another tag seems to give the OR (union) of the results rather than AND (intersection).
>
> For example, checking "exp" and "vs" <https://www.edrdg.org/jmdictdb/cgi-bin/srchres.py?svc=jmdict&s1=1&y1=2&t1=&s2=1&y2=1&t2=&s3=1&y3=1&t3=&idtyp=seq&idval=&src=1&stat=2&appr=appr&appr=unappr&nfcmp=&nfval=&pos=13&pos=46&snote=&snotem=0&smtr=&smtrm=0&ts0=&ts1=&refs=&refsm=0&cmts=&cmtsm=0&mt=0&grp=&search=Search> has both results. Another example is "adv" and "vs" <https://www.edrdg.org/jmdictdb/cgi-bin/srchres.py?svc=jmdict&s1=1&y1=2&t1=&s2=1&y2=1&t2=&s3=1&y3=1&t3=&idtyp=seq&idval=&src=1&stat=2&appr=appr&appr=unappr&nfcmp=&nfval=&pos=6&pos=46&snote=&snotem=0&smtr=&smtrm=0&ts0=&ts1=&refs=&refsm=0&cmts=&cmtsm=0&mt=0&grp=&search=Search>.
>
> I tested some other combinations of tags and most work fine. Though the same behavior seems to manifest with "v1" and "vt" <https://www.edrdg.org/jmdictdb/cgi-bin/srchres.py?svc=jmdict&s1=1&y1=2&t1=&s2=1&y2=1&t2=&s3=1&y3=1&t3=&idtyp=seq&idval=&src=1&stat=2&appr=appr&appr=unappr&nfcmp=&nfval=&pos=28&pos=50&snote=&snotem=0&smtr=&smtrm=0&ts0=&ts1=&refs=&refsm=0&cmts=&cmtsm=0&mt=0&grp=&search=Search>. Same thing with "vi", "conj", and "prt" it seems.

freddie matthews

unread,
Jul 20, 2021, 5:06:54 PM7/20/21
to edict-...@googlegroups.com
Sorry, I somehow missed the notes at the bottom of the page. I see now this was intended behavior and considered in depth. I feel the situation with the tags is already quite complex and the way it's being handled now is at least intuitive within the sections. I realize now I have relied on that to search both jmdict and jmnedict simultaneously for example. I can't think of any solution which wouldn't make the already busy interface more complex. Maybe if there were some natural way to break up the PoS section further… Thanks.

– Opencooper

--
You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/MN2PR07MB5855B114A4D2C55CBBB9A3B2C0E29%40MN2PR07MB5855.namprd07.prod.outlook.com.

Jim Breen

unread,
Jul 20, 2021, 10:28:34 PM7/20/21
to edict-...@googlegroups.com
Interesting discussion.

To do AND searches within the one attribute class, I usually open up
the EDICT2 version in an editor (vim) and use a regular expression to
find the entries with the combination I want.

I realise this is not an option open to everyone (well it IS open to
everyone but you may have to do some environment and software
installation), so if there was any interest I could add a regex search
option to WWWJDIC, e.g in the Advanced Search option
(https://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1P)

Jim
> To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/CANy6PaVODF-2%2By_ckNu3Tc53oJt%3D4gro_%3DZ5fYYhPapaU%3Dz%3DXA%40mail.gmail.com.



--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/
Reply all
Reply to author
Forward
0 new messages