Z-Mert take long time to finish

15 views
Skip to first unread message

Reza Lesmana

unread,
Jun 11, 2015, 8:17:29 PM6/11/15
to joshua_d...@googlegroups.com
Dear devs,

My Joshua Decoder has been running for quite a while, and it seems to be still running Z-MERT.

This is the last log lines.

[mert] rebuilding...
  dep=/home/rezalesmana/thesis/working_directory/data/tune/corpus.en [CHANGED]
  dep=/home/rezalesmana/thesis/working_directory/tune/model/joshua.config [CHANGED]
  dep=tune/model/grammar.filtered.gz.packed/slice_00000.source [CHANGED]
  dep=/home/rezalesmana/thesis/working_directory/tune/joshua.config.final [NOT FOUND]
  cmd=/home/rezalesmana/joshua-v6.0.3/scripts/training/run_zmert.py /home/rezalesmana/thesis/working_directory/data/tune/corpus.en /home/rezalesmana/thesis/working_directory/data/tune/corpus.id --tunedir /home/rezalesmana/thesis/working_directory/tune --tuner zmert --decoder-config /home/rezalesmana/thesis/working_directory/tune/model/joshua.config

I think it's been running for more than 24 hours.

I try to see maybe it's suffering lack of memory or processor. From the "top" log it shows that it still has a lot of free memory
on the machine, but the process seems to have used almost 7GB VIRT and most of them it seems to be on the disk, not on the RAM.  
And, from the processor, it seems to be only running on 1 processor at 100%. 

This is the top log. 

$ top
top - 23:54:52 up 1 day, 8 min,  1 user,  load average: 1.00, 1.01, 1.05
Tasks: 256 total,   1 running, 255 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   7137852 total,  2764884 used,  4372968 free,    62408 buffers
KiB Swap:        0 total,        0 used,        0 free.  1190772 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5934 rezales+  20   0 6675824 1.248g  15700 S  99.8 18.3   1404:08 java
 5919 rezales+  20   0 6710992  47404  15236 S   0.3  0.7   1:34.03 java
    1 root      20   0   33632   4016   2536 S   0.0  0.1   0:02.08 init
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.62 ksoftirqd/0
    4 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+
    7 root      20   0       0      0      0 S   0.0  0.0   0:08.64 rcu_sched
    8 root      20   0       0      0      0 S   0.0  0.0   0:02.25 rcuos/0
    9 root      20   0       0      0      0 S   0.0  0.0   0:03.55 rcuos/1
   10 root      20   0       0      0      0 S   0.0  0.0   0:03.27 rcuos/2
   11 root      20   0       0      0      0 S   0.0  0.0   0:04.31 rcuos/3
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/4
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/5

Am I missing something?

I'm running the pipeline using the basic parameter ( --corpus input/train --tune input/tune --test input/test --source en --target id --aligner berkeley ).

I have noticed about the "--joshua-mem" parameter. If I define this parameter, will Z-MERT able to use more RAM and not the virtual memory on disk?
Should I put another parameter to make Z-MERT goes parallel so it may speed up?

FYI, I'm running on Azure Ubuntu VM and might be able to vertically scale the machine to get more memory and processor for now 
(until I run out of azure credits :D).

Thanks a lot for your help. 

Regards,
Reza Lesmana

Matt Post

unread,
Jun 11, 2015, 11:01:43 PM6/11/15
to joshua_d...@googlegroups.com
What version of Joshua are you using? What languages are you translating between? How many lines are in /home/rezalesmana/thesis/working_directory/data/tune/corpus.en? What files are in the tune/ directory?

matt




--
You received this message because you are subscribed to the Google Groups "Joshua Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_develop...@googlegroups.com.
To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.

Reza Lesmana

unread,
Jun 11, 2015, 11:26:28 PM6/11/15
to joshua_d...@googlegroups.com
Hi, Matt

I'm using joshua v6.0.3, translating English to Indonesian. 
There is around 1200 lines in /home/rezalesmana/thesis/working_directory/data/tune/corpus.en

And these are the files in tune/ directory.

corpus.en  grammar.filtered.gz  tune.en.gz  tune.tok.en.gz  tune.tok.lc.en
corpus.id  grammar.glue         tune.id.gz  tune.tok.id.gz  tune.tok.lc.id

Regards,
Reza Lesmana

Matt Post

unread,
Jun 11, 2015, 11:43:14 PM6/11/15
to joshua_d...@googlegroups.com
> corpus.en grammar.filtered.gz tune.en.gz tune.tok.en.gz tune.tok.lc.en
> corpus.id grammar.glue tune.id.gz tune.tok.id.gz tune.tok.lc.id
>
> Regards,
> Reza Lesmana

No, that is data/tune, where data is stored. The actual tuning run is in tune/, and mert should be generating files there as it works. It can take quite some time. What is the output of ls -ltr working_dir/tune?

matt

Reza Lesmana

unread,
Jun 12, 2015, 1:11:03 AM6/12/15
to joshua_d...@googlegroups.com
I'm sorry. I thought you talk about data/tune folder.

Here is the content of the working_directory/tune folder. 

$ ls -ltr working_directory/tune
total 189340
drwxrwxr-x 3 rezalesmana rezalesmana      4096 Jun 10 05:26 model
-rwxr-xr-x 1 rezalesmana rezalesmana       400 Jun 10 23:55 decoder_command
-rw-rw-r-- 1 rezalesmana rezalesmana       931 Jun 10 23:55 mert.config
lrwxrwxrwx 1 rezalesmana rezalesmana        67 Jun 10 23:55 joshua.config.ZMERT.orig -> /home/rezalesmana/thesis/working_directory/tune/model/joshua.config
-rw-rw-r-- 1 rezalesmana rezalesmana       437 Jun 10 23:55 params.txt
-rw-rw-r-- 1 rezalesmana rezalesmana      4854 Jun 10 23:55 joshua.config
-rw-rw-r-- 1 rezalesmana rezalesmana 101110513 Jun 11 00:32 output.nbest
-rw-rw-r-- 1 rezalesmana rezalesmana    460199 Jun 11 00:32 joshua.log
-rw-rw-r-- 1 rezalesmana rezalesmana      2027 Jun 11 00:32 mert.log
-rw-rw-r-- 1 rezalesmana rezalesmana  38476581 Jun 11 00:32 ZMERT.temp.sents.it1
-rw-rw-r-- 1 rezalesmana rezalesmana  53788672 Jun 11 00:32 ZMERT.temp.feats.it1

If I'm reading this right, it's been a while since this file has been accessed, right?

Matt Post

unread,
Jun 12, 2015, 8:56:05 AM6/12/15
to joshua_d...@googlegroups.com
Yes, it seems stalled, try tailing mert.log and joshua.log to see if there are clues there


Reza Lesmana

unread,
Jun 12, 2015, 11:12:44 AM6/12/15
to joshua_d...@googlegroups.com
Hi, Matt,

This is the tail log of mert.log (using "tail -f mert.log")

----------------------------------------------------------------------------------
$ tail -f mert.log
----------------------------------------------------
Z-MERT run started @ Wed Jun 10 23:55:46 UTC 2015
----------------------------------------------------

Initial lambda[]: {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -2.844814, 1.0}

--- Starting Z-MERT iteration #1 @ Wed Jun 10 23:55:47 UTC 2015 ---
Decoding using initial weight vector {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -2.844814, 1.0}
Running external decoder...
...finished decoding @ Thu Jun 11 00:32:10 UTC 2015
------------------------------------------------------------------------------------


And this is tail log of joshua.log (using "tail -f joshua.log")

----------------------------------------------------------------------------------
$ tail -f joshua.log
Input 1217: <s> thank you very much . </s>
Input 1217: Translation took 0.014 seconds
Memory used after sentence 1217 is 1153.3 MB
Translation 1217: -19.318 terima kasih banyak .
Input 1217: 300-best extraction took 0.332 seconds
Input 1218: <s>  </s>
Translation 1218: Translation took 0 seconds
Decoding completed.
Memory used 302.0 MB
Total running time: 2182 seconds
-----------------------------------------------------------------------------------

It seems to have finished decoding/translating the tuning documents (tune.en), but nothing else from there.

Is this normal? What should I do next?

Regards,
Reza Lesmana

Matt Post

unread,
Jun 16, 2015, 7:37:45 AM6/16/15
to joshua_d...@googlegroups.com
Hi Resa,

Sorry about the late response (there was a deadline yesterday). There were some bugs in Joshua 6.0.3, but I have just released 6.0.4, which I hope will fix them. Please try it out and let me know.

matt

Reza Lesmana

unread,
Jun 16, 2015, 10:54:25 PM6/16/15
to joshua_d...@googlegroups.com
Hi Matt,

Thanks a lot. I will try as soon as I'm finished working and coming home today. 
I will update you as soon as possible. 

Regards,
Reza Lesmana

--
You received this message because you are subscribed to a topic in the Google Groups "Joshua Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/joshua_developers/HI0eGNF51W4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to joshua_develop...@googlegroups.com.

Reza Lesmana

unread,
Jun 17, 2015, 1:06:49 AM6/17/15
to joshua_d...@googlegroups.com
Hi Matt,

Having a break at my work, and try to download Joshua v6.0.4
But, I'm not finding the download link for the 6.0.4 version.

Would you help to show me the link for Joshua v6.0.4?

 Thanks a lot for your help.

Regards,
Reza Lesmana

Matt Post

unread,
Jun 17, 2015, 10:10:45 AM6/17/15
to joshua_d...@googlegroups.com
Fixed

Reza Lesmana

unread,
Jun 17, 2015, 10:29:50 AM6/17/15
to joshua_d...@googlegroups.com
Hi, Matt

I'm sorry to bother you again. But, it seems that the link has not been fixed.

The link on the Joshua Decoder home page that is redirected to http://cs.jhu.edu/~post/files/joshua-v6.0.4.tgz
returns HTTP 404 Not Found response.

Regards,
Reza Lesmana

Matt Post

unread,
Jun 17, 2015, 12:23:04 PM6/17/15
to joshua_d...@googlegroups.com
Sorry, fixed, try again!
Reply all
Reply to author
Forward
0 new messages