dependency LMs: bug in Head LM retrieval?

32 views
Skip to first unread message

Federico Flego

unread,
Oct 19, 2012, 1:09:36 PM10/19/12
to jane-...@googlegroups.com
Hello all,

I'm running Jane to obtaine the depenency LM scores but it seems to me that there's a bug in the code:

I've got the following sentences:

adel ibrahim wrote :
wrote adel ibrahim :
wrote ibrahim adel :

with related parsing:

(ROOT
  (S
    (NP (JJ adel) (NN ibrahim))
    (VP (VBD wrote))
    (: :)))

amod(ibrahim-2, adel-1)
nsubj(wrote-3, ibrahim-2)
root(ROOT-0, wrote-3)

(ROOT
  (NP (NNP wrote) (NNP adel) (NN ibrahim) (: :)))

nn(ibrahim-3, wrote-1)
nn(ibrahim-3, adel-2)
root(ROOT-0, ibrahim-3)

(ROOT
  (NP (NNP wrote) (NNP ibrahim) (NN adel) (: :)))

nn(adel-3, wrote-1)
nn(adel-3, ibrahim-2)
root(ROOT-0, adel-3)

The scores I obtain are:

3.96619 104.265 -0 -1 -2 -0
100 200 -0 -1 -2 -0
6.02887 12.655 -0 -1 -2 -0

I've got the following entries in the head dep LM:

-3.96619    wrote-as-head
-4.799524    ibrahim-as-head
-6.028873    adel-as-head
-5.522673    </s>

So in the second sentence the code is not applying the head depLM score for ibrahim-as-head (it is applied correctly in 1st and 3rd sentences). A penaly of 100 is returned instead.

The initial output of Jane for 2nd sentence:

[0;36m[19/10/2012 14:36:56 dependencylmscorer] [0mcalculating score to this sentence: wrote(2) adel(1) ibrahim(0)
[0;36m[19/10/2012 14:36:56 dependencylmscorer] [0mHeads: wrote(5) adel(4) ibrahim(3)  3
[0;36m[19/10/2012 14:36:56 dependencylmscorer] [0mand this dependency:  2  2  -2

Then doing some debugging:

in function     Cost DependencyLanguageModel::getLMScore

Looking for head word 'adel' (working):

(gdb) p sriLm_->wordProb(4, &vocabBuffer_[2])
$111 = -6.02887297

Looking for head word 'wrote' (working):

(gdb) p sriLm_->wordProb(5, &vocabBuffer_[2])
$112 = -3.9661901

Looking for head word 'ibrahim' (NOT WORKING):

(gdb) p sriLm_->wordProb(3, &vocabBuffer_[2])
$110 = -inf

Just to check with non-head indexes:

(gdb) p sriLm_->wordProb(0, &vocabBuffer_[2])
$113 = -inf
(gdb) p sriLm_->wordProb(1, &vocabBuffer_[2])
$114 = -inf
(gdb) p sriLm_->wordProb(2, &vocabBuffer_[2])
$115 = -5.52267313

Last one is very strange because the only entry in the head LM I have with -5.52267313 is '</s>'!

Is something strange going on with indexes handling here?

Thank you very much for your help!!!

Federico

Jan-Thorsten Peter

unread,
Oct 25, 2012, 8:08:14 AM10/25/12
to jane-...@googlegroups.com
Hi Federico,
sorry for the delayed response. 

You're right that looks strange, but I could not reproduce this error with my own test set.
It would be nice if you could send me your exact jane setup.

You could also try to start jane with this additional parameter:
--CubePrune.Dependency.headLM.verbosity insaneDebug
A little warning, "insaneDebug" really means insane, you can get a lot output even with small sentences.

Best regards,
Jan

Federico Flego

unread,
Oct 26, 2012, 6:13:33 AM10/26/12
to jane-...@googlegroups.com
Hello Jan,

Thanks for your help! I think I've found the bug.
In the DependencyLanguageModel constructor (src/Syntax/DependencyLanguageModel.cc) the function:

   fillDummyWords()

is not called. This makes the 2 vocabularies, sriVocabulary_ and dependencyAlphabet_ out of sync.
Below a fix I've applied, not sure the behaviour is now what it should be, but I don't have the problem above any more and nicer numbers ;)

    DependencyLanguageModel::DependencyLanguageModel(const Core::Configuration &config,
            StaticAlphabetRef dependencyAlphabet, bool justTestingFunctionality) :
        Component(config),
        userOrder_(paramLmOrder_(config)),
        fname_(paramFname_(config)),
        penaltyNoEntry_(paramPenaltyNoEntry_(config)),
        penaltyNotFound_(paramPenaltyNotFound_(config)),
        sriVocabulary_(new Vocab(0)), sriLm_(0),
        vocabBuffer_(new VocabIndex[50]), vocabBufferSize_(50),
        dependencyAlphabet_(dependencyAlphabet) {
        fillDummyWords(dependencyAlphabet_); /* ff257 otherwise sriVocabulary_ and dependencyAlphabet_ are not synced! */
        [...]

Cheers,

Federico

Jan-Thorsten Peter

unread,
Oct 29, 2012, 6:06:23 AM10/29/12
to jane-...@googlegroups.com
Hi Federico,

fillDummyWords() get usually called by the SriLMInterface for the targetAlphabet before it is passed on to the DependencyLanguageModel as dependencyAlphabet.
This bug shows if this does not happen in this order or not at all.
I fixed in in our repository, thanks a lot!

Cheers,

Jan
Reply all
Reply to author
Forward
0 new messages