Hi Matt,
Firstly, I realized that the problem with giza++ still persist. After analyzing log file, I find that is a problem of symal package as mentioned below:
[.../alignments/0/giza.log]
....
Executing: bash -c rm -f alignments/0/giza.fr.0-en.0/fr.0-en.0.A3.final.gz
Executing: bash -c gzip alignments/0/giza.fr.0-en.0/fr.0-en.0.A3.final
Waiting for second GIZA process...
(3) generate word alignment @ Sat Jan 16 00:36:15 GMT 2016
Combining forward and inverted alignment from files:
alignments/0/giza.en.0-fr.0/en.0-fr.0.A3.final.{bz2,gz}
alignments/0/giza.fr.0-en.0/fr.0-en.0.A3.final.{bz2,gz}
Executing: bash -c mkdir -p alignments/0/model
Executing: bash -c /home/elma/git/joshua/scripts/training/symal/
giza2bal.pl -d <(gzip -cd alignments/0/giza.fr.0-en.0/fr.0-en.0.A3.final.gz) -i <(gzip -cd alignments/0/giza.en.0-fr.0/en.0-fr.0.A3.final.gz) |/home/elma/git/joshua/scripts/training/symal/symal -alignment="grow" -diagonal="yes" -final="yes" -both="no" -o=alignments/0/model/aligned.grow-diag-final
bash: /home/elma/git/joshua/scripts/training/symal/symal: No such file or directory
bash: /home/elma/git/joshua/scripts/training/symal/giza2bal.pl: No such file or directory Exit code: 127
ERROR: Can't generate symmetrized alignment file
Berkeley works fine, but the pipeline fails in next steps:
- The problem with the run in my previous mail was OutOfMemory problem (RAM Saturation of my virtual machine).
- I'm still continuing my experiments, but I have always problems: either with outofmemory exception or with this error NullPointerException:
[.../tune/joshua.log]
Input 0: <s> what i 'm going to show you first , as quickly as i can , is some foundational work , some new technology that we brought to microsoft as part of an acquisition almost exactly a year ago . this is seadragon , </s>
Input 0: Collecting options took 0.000 seconds
Input 0: FATAL UNCAUGHT EXCEPTION: null
java.lang.NullPointerException
at joshua.decoder.phrase.Candidate.score(Candidate.java:214)
at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:136)
at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:19)
at java.util.HashMap.compareComparables(HashMap.java:371)
at java.util.HashMap$TreeNode.treeify(HashMap.java:1920)
at java.util.HashMap.treeifyBin(HashMap.java:771)
at java.util.HashMap.putVal(HashMap.java:643)
at java.util.HashMap.put(HashMap.java:611)
at java.util.HashSet.add(HashSet.java:219)
at joshua.decoder.phrase.Stack.addCandidate(Stack.java:125)
at joshua.decoder.phrase.Stacks.search(Stacks.java:166)
at joshua.decoder.DecoderThread.translate(DecoderThread.java:113)
at joshua.decoder.Decoder$DecoderThreadRunner.run(Decoder.java:218)
I used --type {phrase,moses} in my last experiments to skip OutofMemory problem but the above NullPointerException is triggered.
Finally, I decided to migrate to CentOS in order to avoid these problems of building C++ components of Joshua. I installed my OS and I start environnement preparation today.
Thank you!