search for lemmas in Japanese

54 views
Skip to first unread message

Eugenia Diegoli

unread,
May 22, 2024, 4:24:06 AMMay 22
to AntConc-Discussion
Hello, 

thank you very much for making AntConc available, I've been using it for a while now and really like it! 

I have a Japanese corpus tagged with TagAnt (with the option simple_word_pos_headword_indexer). I want to look for the collocates of the verb okoru 起こる. I tried to use the wild card * (起こ*, 起こ*_V*), but then can't find a way to distinguish between okoru 起こる and okosu 起こす. Do you have any advice on this?

Thank you very much for your support, I really appreciate it. 

All the best, 

Eugenia

Laurence Anthony

unread,
May 22, 2024, 8:53:54 PMMay 22
to ant...@googlegroups.com
Hi,

>  I want to look for the collocates of the verb okoru 起こる

For this task, you don't need to use wildcards in the search. Just search for "起こる" directly in the Collocates tool.

I hope this helps.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/ff7abdec-27ec-485d-ad44-39fbe40ccd34n%40googlegroups.com.

Eugenia Diegoli

unread,
May 22, 2024, 9:22:25 PMMay 22
to ant...@googlegroups.com
Hi Laurence, 

thank you very much for your prompt reply.

If I look for 起こる in the collocate tools, or in the concordancer, it gives me back only instances of the verb as it is (in the shushikei or renyokei), without including inflected forms (e.g. 起こります、起こってetc.). I attach a screenshot of the concordance query. 
スクリーンショット 2024-05-23 10.20.30.png

I'm sure I'm doing something wrong but can't figure out what.

Eugenia

Laurence Anthony

unread,
May 22, 2024, 9:52:22 PMMay 22
to ant...@googlegroups.com
Hi,

So, it seems that you want to search for  起こる as a lemma headword. In TagAnt, choose the word+pos_tag+lemma option when you tag your data. You should get something like the following:

何_代名詞_何 が_助詞-格助詞_が 起こり_動詞-一般_起こる ます_助動詞_ます ?_補助記号-句点_?


Then, after you load that into AntConc, you can search for *_*_起こる to get what you want.


I hope that helps.


Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
May 24, 2024, 1:40:04 AMMay 24
to ant...@googlegroups.com
Hi Laurence, 

thank you very much for your reply.

I tagged my corpus with the word+pos_tag+lemma option but when I try to upload it onto AntConc something goes wrong. I tried both with the simple_word_indexer and the simple_word_pos_headword_indexer options. The only other option available seems to be the bibertag. What am I missing?

Thank you and sorry for taking up so much of your time. 

Eugenia

Laurence Anthony

unread,
May 24, 2024, 1:55:24 AMMay 24
to ant...@googlegroups.com
Hi,

> when I try to upload it onto AntConc something goes wrong

What goes wrong? Can you provide details?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
May 24, 2024, 2:05:08 AMMay 24
to ant...@googlegroups.com
This is the description of the corpus created with the tagged data and selecting the simple-word-pos-headword-indexer option.  The token count is way too high.
スクリーンショット 2024-05-24 15.02.05.png

and these are the items it reads as words.

スクリーンショット 2024-05-24 15.04.15.png

Laurence Anthony

unread,
May 24, 2024, 2:11:48 AMMay 24
to ant...@googlegroups.com
Hi,

None of the types are displaying correctly. Are you sure you are using UTF-8 encoded files? Perhaps you can send me one tagged file so I can check.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
May 24, 2024, 2:14:32 AMMay 24
to ant...@googlegroups.com
Hi Laurence, 

this is an excerpt from the tagged file. 

遠_接頭辞_遠 距離_名詞-普通名詞-一般_距離 恋愛_名詞-普通名詞-サ変可能_恋愛 2_名詞-数詞_2 年_名詞-普通名詞-助数詞可能_年 目_接尾辞-名詞的-一般_目 です_助動詞_です 。_補助記号-句点_。  _SPACE_  昨日_名詞-普通名詞-副詞可能_昨日 彼_代名詞_彼 から_助詞-格助詞_から 写メ_名詞-普通名詞-一般_写メ が_助詞-格助詞_が 送ら_動詞-一般_送る れ_助動詞_れる て_助詞-接続助詞_て 来_動詞-非自立可能_来る まし_助動詞_ます た_助動詞_た 。_補助記号-句点_。  _SPACE_  明らか_形状詞-一般_明らか に_助動詞_だ 誰_代名詞_誰 か_助詞-副助詞_か に_助詞-格助詞_に 撮っ_動詞-一般_撮る て_助詞-接続助詞_て もらっ_動詞-非自立可能_もらう て_助詞-接続助詞_て いる_動詞-非自立可能_いる 、_補助記号-読点_、 自宅_名詞-普通名詞-一般_自宅 で_助詞-格助詞_で は_助詞-係助詞_は ない_形容詞-非自立可能_ない 写メ_名詞-普通名詞-一般_写メ と_助詞-格助詞_と 思わ_動詞-一般_思う れる_助動詞_れる もの_名詞-普通名詞-サ変可能_もの です_助動詞_です 。_補助記号-句点_。


Hope it helps. 


Eugenia


Laurence Anthony

unread,
May 24, 2024, 2:16:14 AMMay 24
to ant...@googlegroups.com
Hi,

Copying the contents of a file into email will not serve as a proper test. I need the actual file.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
May 24, 2024, 2:19:17 AMMay 24
to ant...@googlegroups.com
que_ans_0_人間関係_5000_lemma_AntConc_1.rtf

Laurence Anthony

unread,
May 24, 2024, 5:05:56 AMMay 24
to ant...@googlegroups.com
You're trying to load a rich text file (.rtf). You need to use a text file, or docx, or pdf. 

TagAnt doesn't output rtf files, so I'm not sure what you did to generate this.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
May 24, 2024, 5:08:30 AMMay 24
to ant...@googlegroups.com
I see, thank you very much!!

All the best, 

Eugenia

2024年5月24日(金) 18:05 Laurence Anthony <antho...@gmail.com>:

Laurence Anthony

unread,
May 24, 2024, 5:15:16 AMMay 24
to ant...@googlegroups.com
Have you managed to resolve the issue?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
May 24, 2024, 5:27:54 AMMay 24
to ant...@googlegroups.com
I’ll try tomorrow and keep you posted :) 

2024年5月24日(金) 18:15 Laurence Anthony <antho...@gmail.com>:

Laurence Anthony

unread,
May 24, 2024, 7:42:19 AMMay 24
to ant...@googlegroups.com
Hi again,

 I saved your rtf file as text and generated the following results:

image.png

image.png

image.png

image.png

image.png

So, AntConc looks to be working fine!

I hope that helps.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
May 27, 2024, 12:27:54 AMMay 27
to ant...@googlegroups.com
Dear Laurence,

apologies for the slow reply. I have just updated my files on AntConc and it works perfectly fine. Thank you very much for your help!

All the best, 

Eugenia

Eugenia Diegoli

unread,
Jun 4, 2024, 12:46:46 AMJun 4
to AntConc-Discussion
Hi Laurence, 

I'm really sorry to bother you again. Hopefully this time I won't take too much of your time. 

I'm using AntConc version 4.2.4 but don't seem to be able to select the logDice measure for collocational analysis (see screenshot).

According to the manual it should be there. What am I missing?

Thank you as always for your help. 

Eugenia
スクリーンショット 2024-06-04 13.45.23.png

Laurence Anthony

unread,
Jun 4, 2024, 12:54:30 AMJun 4
to ant...@googlegroups.com
Hi,

LogDice is one of the effect size measures that you can use. You can select it in the tool settings.

Laurence


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eugenia Diegoli

unread,
Jun 4, 2024, 1:05:43 AMJun 4
to ant...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages