Forced parsing

Daniel Krauße

unread,

May 26, 2016, 8:10:53 PM5/26/16

to Shoebox/Toolbox Field Linguist's Toolbox

Hey everyone,

I've got an issue with this word from Javanese:

"ngaku"

It should be parsed like this:

ngaku

N- aku

VBZ- 1SG

pref- pers.pron

So, there is a verbalizer and the pronoun 1st singular. I have already specified in my dictionary that N- (a nasal verbalizer) can have these forms: ny-, n-, nge-, ng-, m- or nga-. So I am expecting to get ng-aku easily parsed as such, but Toolbox doesn't offer me any ambiguity, and instead parses as nga-ku (which doesn't make sense, but "ku" is also in my dictionary with another meaning). I tried forcing ng-aku by specifying the word formula "pre pers.pron" or "pref pers.pron" but neither works. Any ideas?

Thanks in advance

Daniel

Alex Francois

unread,

May 28, 2016, 12:54:11 PM5/28/16

to shoeboxtoolbox-fiel...@googlegroups.com

hi Daniel,

Have you tried using an underlying form \u so as to force the parsing?

I once learned this technique using Shoebox, and I'm not sure it is still implemented in the latest versions of Toolbox.

If it does, your entry would look like this:

\lx aku
(...)
\a ngaku
\u N- aku

…using just a space between N- and aku.

(NB: it doesn't really matter under what \lx you add the \a field, if it is immediately followed by \u)

Try that first. If necessary, you could even specify the nature of the morphemes by adding their glosses in curly brackets (no spaces):

\a ngaku
\u N-{VBZ} aku{1SG}

You may want to follow up on this from this Pdf file (especially pp.4-5), as well as this thread from 2008.

best,

Alex

_________

Alex François
Directeur, LACITO-CNRS, France
Australian National University, Canberra
Academia page – Personal homepage

Les Carnets du LaCiTO
__________________

--
You received this message because you are subscribed to the Google Groups "Shoebox/Toolbox Field Linguist's Toolbox" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shoeboxtoolbox-field-ling...@googlegroups.com.
To post to this group, send email to shoeboxtoolbox-fiel...@googlegroups.com.
Visit this group at https://groups.google.com/group/shoeboxtoolbox-field-linguists-toolbox.
For more options, visit https://groups.google.com/d/optout.

Daniel Krauße

unread,

May 28, 2016, 1:04:21 PM5/28/16

to Shoebox/Toolbox Field Linguist's Toolbox

Hi Alex,

man, that was easy haha. I had been using the underlying form for other purposes before and hadn't thought of using it for forced parsing. It does work indeed! Thanks a lot.

All the best,

Daniel

To unsubscribe from this group and stop receiving emails from it, send an email to shoeboxtoolbox-field-linguists-toolbox+unsubscribe@googlegroups.com.
To post to this group, send email to shoeboxtoolbox-field-linguists-toolbox@googlegroups.com.

Margaux Dubuis

unread,

Dec 9, 2021, 4:41:02 AM12/9/21

to Shoebox/Toolbox Field Linguist's Toolbox

Dear Daniel and Alex,

I also encontered such a problem and tried out the solution suggested. Unfortunaltely it didn't work.

I'm working on Shipibo-Konibo and the segment I want to gloss is:

jabe

ja=be

3SG:ABS=COM

=be is stored in my dict but toolbox only suggests -be (which is another suffix). I checekd the language encoding and made sure that the '=' is ignored as a character but I still don't get any ambiguity for it.

Any idea how to fix it?

Best and thanks for your help!

Margaux

To unsubscribe from this group and stop receiving emails from it, send an email to shoeboxtoolbox-field-ling...@googlegroups.com.
To post to this group, send email to shoeboxtoolbox-fiel...@googlegroups.com.

Alex Francois

unread,

Dec 9, 2021, 5:11:51 AM12/9/21

to shoeboxtoolbox-fiel...@googlegroups.com

dear Margaux,

Did you mean ja= be ? or ja =be ? which one is the clitic? from your message, I assume it's =be.

If so, you should indicate the right parsing to Toolbox:

\a jabe
\u ja =be

These two lines could be stored under \lx ja or under \lx =be (but in fact you could have them stored under any \lx in your dictionary).

Have you tried this?

Now if it doesn't work, it must be about the handling of the '=' sign. I'm not sure that "ignoring" it is the correct thing to do; but I can't remember the exact rules about special separators like this.

best

Alex

Alex François

LaTTiCe — CNRS–ENS–Sorbonne nouvelle
Australian National University

Personal homepage

_________________________________________

To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/06ef04d7-370b-4569-907f-b0c1272a9159n%40googlegroups.com.

Margaux Dubuis

unread,

Dec 9, 2021, 5:56:01 AM12/9/21

to Shoebox/Toolbox Field Linguist's Toolbox

Dear Alex,

Thank you very much for your answer!

I tried it and I still get the ambiguity selection only with the suffix -be and not the clitic =be...

regrading the '= 'sign, in an other question asked in this forum I read, that the sign should be set as 'ignored' in the language encoding for the sort order. This is what I meant wiht 'ignored' above.

But, well,... Obviously I still miss something!

If you have any idea, it's welcome!

Thanks very much for your help

Margaux

Alex Francois

unread,

Dec 9, 2021, 6:22:59 AM12/9/21

to shoeboxtoolbox-fiel...@googlegroups.com

hi Margaux,

Indeed, in one of my Toolbox projects, I have the equal sign listed under "ignore characters", by contrast with the hyphen which appears in the previous box "Secondary characters ordered after…":

(I can't recall if I had done that manually myself, or if it was the default.)

Perhaps you could try to move the equal sign up to the previous box,
so it behaves like the hyphen? so you'd get something like this:

I can't really do the test myself, because I don't actually have entries that use the equal sign.

Also, this part of Toolbox remains a bit mysterious to me…

bonne chance

Alex

Alex François

LaTTiCe — CNRS–ENS–Sorbonne nouvelle
Australian National University

Personal homepage

_________________________________________

To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/d0f723fe-a1f9-4c23-9f35-fa7b2557a769n%40googlegroups.com.

ToolBox Support

unread,

Dec 9, 2021, 1:26:57 PM12/9/21

to shoeboxtoolbox-fiel...@googlegroups.com

Hi, Margaux,

Just to be sure I understand...

Your text contains: jabe as a whole word.

Your lexicon contains: \x ja and \lx =be and \lx -be

If you do Database, Properties, Interlinear

and select the Parse process

and click on Modify

you see the following:

The hyphen and the equals are the default Morpheme break characters and you would probably know if you had removed the equals for some reason.

When I put those three entries into my test dictionary, I get ambiguity between ja -be and ja =be. If I add

\lx ja

\a jabe
\u ja =be

(in my case, I added it to the \lx ja entry), I get no ambiguity, just the ja =be parse.

I observe that both - and = are in the Ignore field of the sort sequence.

Do you have all these pieces in your Toolbox setup? If so, do you have any other \a-\u combinations involving the sequence jabe?

Karen

Toolbox Support

To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/CAGcZC0odXe6s_YLyJxLv42B6Q4eCznnfcgvWUjzK7EE85fvNpw%40mail.gmail.com.

Margaux Dubuis

unread,

Jan 25, 2022, 4:05:36 PM1/25/22

to Shoebox/Toolbox Field Linguist's Toolbox

Dear Alex and Karen,

Thanks for your suggestions!

@Alex, what we first did is to include the '-' and the '=' as characters in the language encoding for the language.

Then, I understood that in my dictionary, I was supposed to have one entry per \a ** so I corrected it and the ambiguity window finally suggested more or less all the options. ** so far I had "\a ja; ha" for example, which doesn't work because Toolbox only sees the first one.

In the case it still wouldn't parse correctly then I'd add an \a wiht the full form (in this case jabe) and an \u with the correct morpheme break (here ja =be) and then it worked!