Dear Dan,
We have try to build a lang from a 2G text corpus. But unfortunately failed due to shortages of memory.
So we have create an ec2 instance with 64GB and add a 100G swap to avoid this problem, and then the problem solved.
when
we go a head to create a graph, the same problem occur. So I have
increased swap to (130G) to resolve memory issue. And according to not
responding problem, I have segmented the following step:
fsttablecompose $lang/L_disambig.fst $lang/G.fst | fstdeterminizestar --use-log=true | \
fstminimizeencoded | fstpushspecial | \
fstarcsort --sort_type=ilabel > $lang/tmp/LG.fst.$$ || exit 1;
mv $lang/tmp/LG.fst.$$ $lang/tmp/LG.fst
fstisstochastic $lang/tmp/LG.fst || echo "[info]: LG not stochastic."
to these steps:
fsttablecompose $lang/L_disambig.fst $lang/G.fst > $1/tmp/LG.fst.1
#echo "Compose Complete"
fstdeterminizestar $1/tmp/LG.fst.1 > $1/tmp/LG.fst.2
echo "Determinizer Complete"
fstminimizeencoded $1/tmp/LG.fst.2 > $1/tmp/LG.fst.3
echo "Encoded Complete"
fstpushspecial $1/tmp/LG.fst.3 > $1/tmp/LG.fst.4
echo "Special Complete"
fstarcsort --sort_type=ilabel $1/tmp/LG.fst.4 > $1/tmp/LG.fst.5
mv $1/tmp/LG.fst.5 $1/tmp/LG.fst
echo "Renaming Complete"
fstisstochastic $1/tmp/LG.fst || echo "[info]: LG not stochastic."
this segmentation solved the problem of composing the LG.fst.
But
for creating the Ha.fst in the next step, a hangup problem is occur.
Note that I have run a shell to check the memory each 15 seconds and
find out in the log that the remaining memory is 35G out of 194G before
"Hangup" problem. The following is the original segment of code:
fsttablecompose $dir/Ha.fst "$clg" | fstdeterminizestar --use-log=true \
| fstrmsymbols $dir/disambig_tid.int | fstrmepslocal | \
fstminimizeencoded > $dir/HCLGa.fst.$$ || exit 1;
mv $dir/HCLGa.fst.$$ $dir/HCLGa.fst
fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
and the modified one is:
fsttablecompose $dir/Ha.fst "$clg" > $dir/Ha.fst.1
fstdeterminizestar --use-log=true $dir/Ha.fst.1 > $dir/Ha.fst.2
fstrmsymbols $dir/disambig_tid.int $dir/Ha.fst.2 > $dir/Ha.fst.3
fstrmepslocal $dir/Ha.fst.3 > $dir/Ha.fst.4
fstminimizeencoded $dir/Ha.fst.4 > $dir/HCLGa.fst.$$ || exit 1;
mv $dir/HCLGa.fst.$$ $dir/HCLGa.fst
fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
when I have left the server work, and come next day to see the log file.
I have found "Hangup" in the log, the process stopped and the memory is free.
I have attached the modified make graph shell within this email.
I will be thankful for your help.
Thank you
Best Regards
Nour Alhuda Damer