sorry for the late reply.
J C <cha...@hotmail.com> writes:
> I am emailing you in regards to HunPOS which does not
> seem to have any active support.
There is the hunpos mailing list:
However, there's not much traffic either (read: it's save to
subscribe ;-). I propose, that we discuss this further on that list,
because Rachel asked the same question there and it seems to be the
appropriate place.
> I notice each of you have used HunPOS
> in the past and I would like to know if you had any troubles building
> your own model using the hunpos-train.
I'm still using HunPOS and have developed an incremental mode for my
bachelor thesis. I haven't pushed this upstream yet, though :-/
> However to train a model using hunpos-train, a file is
> necessary containing a single word and part of speech separated by a
> tab per line. Furthermore as the specifications state, sentences are
> escaped by empty lines.
That's correct.
> [snip]
> hunpos-train model.model < input.txt
> reading training corpus compiling probabilities Fatal error: exception
> Failure("empty context_trie")
I didn't come across this until I wanted to answer this mail with
"worksforme". Now I tried to train a model by hand and got the same
error. The problem seems to be (at least here), that the input size is
too small:
This works:
sed 's/ /\t/' somefile | head -n 35 | ./trainer.native models/test
This doesn't, yielding the same error as mentioned above:
sed 's/ /\t/' somefile | head -n 20 | ./trainer.native models/test
Does this information help? I'll try to dig deeper into this if you
(Rachel and/or John) still have the problem and provide me with your
training corpus.
Greetings,
Arne
Well, the main developer has practically abandoned the project and
gone for some different full time business, unfortunately.
>
> > [snip]
> > hunpos-train model.model < input.txt
>
> > reading training corpus compiling probabilities Fatal error: exception
> > Failure("empty context_trie")
This came up some time ago already and find below Peter's answer then.
Best,
Csaba Oravecz
----------------------------------------------------------------------
From: Peter Halacsy <pe...@halacsy.com>
To: hun...@googlegroups.com
In-Reply-To: <e4ef9313-55e2-4c81...@e25g2000prg.googlegroups.com>
Subject: Re: Curious about hunpos
Date: Wed, 30 Jan 2008 19:48:43 +0100
On Jan 30, 2008, at 2:03 PM, zeljk...@gmail.com wrote:
>
> Peter Halacsy wrote:
>
>> <cut />
>
> BTW, Peter, on many training files that I have available, the training
> procedure breaks down stating
>
> reading training corpus
> compiling probabilities
> Fatal error: exception Failure("empty context_trie")
>
> However, I was not able to determine why this happens on some files
> and yet does not on others. I use input files in the format
>
> wordform1[SPACES or TABS]tag1
> ...
>
> with newlines as sentence delimiters. Do you know what causes this
> kind of error?
>
this it's a bug.
in your training file there are not enough tokens matching the regular
expressions defining cardinals. These are
For some regular expressions Hunpos learns the tag distribution of the
training corpus separately to give more reliable estimates for open
class items like numbers unseen during training. (see http://mokk.bme.hu/archive/halacsy07acl
)
What we can do?
1. factor out the wired reg exp and make them configurable (test
version is done on my computer)
2. check if there are not enough sample for one of the open token class
3. add some dummy data to the training corpus
I hope this helps
peter