NMT training using fairseq

1,407 views
Skip to first unread message

sas cam

unread,
Jan 5, 2021, 5:23:16 AM1/5/21
to fairseq Users
Dear All,

Could anybody please share the steps required to train a NMT model using fairseq ?

Expecting a help

Sascam

Sunil Kumar

unread,
Jan 5, 2021, 12:23:55 PM1/5/21
to fairseq Users
The first step is to get a parallel corpus, followed by tokenisation and then preprocessing to binary format for fairseq. Then training can be done followed by inference. It would be great if you can share anything specific you want to know as https://fairseq.readthedocs.io/en/latest/getting_started.html#training-a-new-model covers pretty much the entire pipeline. 
Cheers,
--

sas cam

unread,
Jan 6, 2021, 5:24:15 AM1/6/21
to Sunil Kumar, fairseq Users
Thanks for the suggestion. I have gone through the fairseq documentation for training a new model. What I have understood is we have to run the scripts.
I was looking for precise training step commands (like moses training step commands are available for SMT ).
Could you please provide the list of step commands to train a new NMT model ?

Looking forward to hear from you

Thanking You
Sascam

--
You received this message because you are subscribed to a topic in the Google Groups "fairseq Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fairseq-users/QKPonaA3D4A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fairseq-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/0ebb2bf3-4388-46ce-8738-c4ddb88f028bn%40googlegroups.com.

Sunil Kumar

unread,
Jan 6, 2021, 9:20:27 AM1/6/21
to fairseq Users
Well, you can open the scripts and see the steps inside them, rather than just running them. If you are comfortable with basic shell scripting, you can easily follow the code. Thanks.

sas cam

unread,
Jan 18, 2021, 5:35:22 AM1/18/21
to Sunil Kumar, fairseq Users
Dear all,

When i try to do preprocess step, I am getting the following error. Could you please help me to correct this ?

"AttributeError: module 'torch' has no attribute 'BoolTensor'"

fairseq-preprocess --source-lang en --target-lang hi \ --trainpref $TEXT/IITB.en-hi --validpref $TEXT/IITB.en-hi --testpref $TEXT/IITB.en-hi \ --destdir data-bin/IITB.en-hi
Traceback (most recent call last):
  File "/home/sreela/.local/bin/fairseq-preprocess", line 5, in <module>
    from fairseq_cli.preprocess import cli_main
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq_cli/preprocess.py", line 18, in <module>
    from fairseq import options, tasks, utils
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/__init__.py", line 19, in <module>
    import fairseq.criterions  # noqa
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/criterions/__init__.py", line 13, in <module>
    from fairseq.criterions.fairseq_criterion import (  # noqa
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/criterions/fairseq_criterion.py", line 9, in <module>
    from fairseq import metrics, utils
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/utils.py", line 20, in <module>
    from fairseq.data import iterators
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/data/__init__.py", line 7, in <module>
    from .dictionary import Dictionary, TruncatedDictionary
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/data/dictionary.py", line 13, in <module>
    from fairseq.data import data_utils
  File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/data/data_utils.py", line 491, in <module>
    def lengths_to_padding_mask(lens: torch.LongTensor) -> torch.BoolTensor:
AttributeError: module 'torch' has no attribute 'BoolTensor'

Sunil Kumar

unread,
Jan 18, 2021, 12:47:24 PM1/18/21
to fairseq Users
Hi Sreela, 
It would be helpful if you can share the PyTorch and Fairseq version, also the path of the repository if you are trying to replicate some scripts.
Thanks.

sascam

unread,
Jan 19, 2021, 12:31:03 AM1/19/21
to Sunil Kumar, fairseq Users
Hi,

fairseq version is 0.10.1
Pytorch is 1.7.1
CUDA is 10.0.130

I am not trying to replicate any scripts. I have executed fairseq-preprocess --source-lang en --target-lang hi \ --trainpref $TEXT/IITB.en-hi --validpref $TEXT/IITB.en-hi --testpref $TEXT/IITB.en-hi \ --destdir data-bin/IITB.en-hi

the fairseq is installed in     /home/sreela/work/fairseq_temp/fairseq-master

Hope it will help

Sunil Kumar

unread,
Jan 19, 2021, 10:32:50 PM1/19/21
to fairseq Users
Hi, 
I am not able to replicate the error. The only difference is I have CUDA10.2. 
Generally, the BoolTensor error is encountered when you are using an older version of PyTorch where you had ByteTensor for BoolTensor. 
Troubleshooting:
a) Start the Python REPL, import torch, and see if you can declare and work with Bool Tensors. 
Thanks.

sascam

unread,
Jan 21, 2021, 3:26:26 AM1/21/21
to Sunil Kumar, fairseq Users
Hi,
 
I created a new environment and started a fresh installation. I have installed a lower version of pytorch using- 
                            conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
then installed fairseq using 
                            python setup.py build develop
After that when I am executing
                            PYTHONPATH=/path/to/this/fairseq python -m fairseq_cli.train

I am getting the following Runtime error,

PYTHONPATH=/home/sreela/fairseq python -m fairseq_cli.train

Traceback (most recent call last):
  File "/home/sreela/anaconda3/envs/NMT/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/sreela/anaconda3/envs/NMT/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sreela/fairseq/fairseq_cli/train.py", line 417, in <module>
    cli_main()
  File "/home/sreela/fairseq/fairseq_cli/train.py", line 404, in cli_main
    args = options.parse_args_and_arch(parser, modify_parser=modify_parser)
  File "/home/sreela/fairseq/fairseq/options.py", line 142, in parse_args_and_arch
    raise RuntimeError()
RuntimeError

could you please help me to resolve the error ?

Looking  forward to hear from you soon

Sunil Kumar

unread,
Jan 24, 2021, 3:55:14 AM1/24/21
to fairseq Users
It looks like you are not providing any arguments. Instead, use the command line tools installed by fairseq directly
Something like this: 

>> CUDA_VISIBLE_DEVICES=0,1,2,3 

 >> fairseq-train \

    data-bin/papiamento_pap_en_bpe3000/ \

    --log-format json \

    --source-lang pap --target-lang en \

    --arch transformer --share-all-embeddings \

    --encoder-layers 4 --decoder-layers 4 \

    --encoder-embed-dim 512 --decoder-embed-dim 512 \

    --encoder-ffn-embed-dim 1024 --decoder-ffn-embed-dim 1024 \

    --encoder-attention-heads 4 --decoder-attention-heads 4 \

    --encoder-normalize-before --decoder-normalize-before \

    --dropout 0.4 --attention-dropout 0.2 --relu-dropout 0.2 \

    --weight-decay 0.0001 \

    --label-smoothing 0.2 --criterion label_smoothed_cross_entropy \

    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0 \

    --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-7 \

    --lr 1e-3 --min-lr 1e-9 \

    --max-tokens 4000 \

    --max-epoch 100 --save-interval 10

I hope this helps. Before running train, though you have to run fairseq-preprocess with appropriate arguments. 
Something like this:

fairseq-preprocess \ 

  --source-lang $SRC --target-lang $TGT \

  --trainpref $TMP/train.bpe --validpref $TMP/valid.bpe --testpref $TMP/test.bpe \

  --destdir $DATABIN \

  --joined-dictionary \

  --tokenizer space \

  --workers 4

sascam

unread,
Jan 27, 2021, 1:16:53 AM1/27/21
to Sunil Kumar, fairseq Users
Hi,

Thanks for the suggestion. I have done the preprocessing successfully. But, when I am trying to train, I am getting -ValueError: --optimizer is required!- error.
I have used - CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/IITB.en-hi --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 --arch fconv_iwslt_de_en --save-dir checkpoints/fconv

Could you please help me to correct the error ?

 Error from the terminal

2021-01-27 11:37:28 | WARNING | fairseq.tasks.fairseq_task | 3 samples have invalid sizes and will be skipped, max_positions=(1022, 1022), first few sample ids=[1290557, 1445892, 1491388]

Traceback (most recent call last):
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/bin/fairseq-train", line 11, in <module>
    sys.exit(cli_main())
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/train.py", line 352, in cli_main
    distributed_utils.call_main(args, main)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/distributed_utils.py", line 301, in call_main
    main(args, **kwargs)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/train.py", line 114, in main
    disable_iterator_cache=task.has_sharded_data("train"),
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 216, in load_checkpoint
    trainer.lr_step(epoch_itr.epoch)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/trainer.py", line 779, in lr_step
    self.lr_scheduler.step(epoch, val_loss)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/trainer.py", line 193, in lr_scheduler
    self._build_optimizer()  # this will initialize self._lr_scheduler
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/trainer.py", line 219, in _build_optimizer
    self._optimizer = optim.build_optimizer(self.args, params)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/optim/__init__.py", line 45, in build_optimizer
    return _build_optimizer(optimizer_cfg, params, *extra_args, **extra_kwargs)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/registry.py", line 45, in build_x
    raise ValueError("--{} is required!".format(registry_name))
ValueError: --optimizer is required!

sascam

unread,
Jan 27, 2021, 2:17:23 AM1/27/21
to Sunil Kumar, fairseq Users
sorry it was an optimizer argument error. i resolved it.

sascam

unread,
Feb 5, 2021, 5:43:21 AM2/5/21
to Sunil Kumar, fairseq Users
HI,

How to give a test data for translating using fairseq-generate?

I have tried in two ways,  In first, I have given the data-bin location and it was printing for days.

1) fairseq-generate data-bin/IITB.en-hi --path checkpoints/fconv/checkpoint_best.pt --batch-size 128 --beam 5 --skip-invalid-size-inputs-valid-test > translate.out

In second, I have given a directory path where I have kept a test data test.en. It is giving me an exception- Exception: Could not infer language pair, please provide it explicitly

2) fairseq-generate /home/sreela/NMT/corpus/test-en-hi/ --path checkpoints/fconv/checkpoint_best.pt --batch-size 128 --beam 5 --skip-invalid-size-inputs-valid-test

Could you please clarify, is it the right way to specify a test data to translate ?

view from terminal :

(nmtPT-1.5) sreela@dcsgpu:~/fairseq$ fairseq-generate /home/sreela/NMT/corpus/test-en-hi/ --path checkpoints/fconv/checkpoint_best.pt --batch-size 128 --beam 5 --skip-invalid-size-inputs-valid-test
2021-02-05 14:52:49 | INFO | fairseq_cli.generate | Namespace(all_gather_list_size=16384, batch_size=128, batch_size_valid=128, beam=5, bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', constraints=None, cpu=False, criterion='cross_entropy', curriculum=0, data='/home/sreela/NMT/corpus/test-en-hi/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoding_format=None, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, left_pad_source='True', left_pad_target='False', lenpen=1, lm_path=None, lm_weight=0.0, load_alignments=False, localsgd_frequency=3, log_format=None, log_interval=100, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_source_positions=1024, max_target_positions=1024, max_tokens=None, max_tokens_valid=None, memory_efficient_bf16=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', model_parallel_size=1, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, no_seed_provided=False, nprocs_per_node=1, num_batch_buckets=0, num_shards=1, num_workers=1, optimizer=None, path='checkpoints/fconv/checkpoint_best.pt', pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, prefix_size=0, print_alignment=False, print_step=False, profile=False, quantization_config_path=None, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, required_seq_len_multiple=1, results_path=None, retain_dropout=False, retain_dropout_modules=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, scoring='bleu', seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang=None, target_lang=None, task='translation', temperature=1.0, tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, unkpen=0, unnormalized=False, upsample_primary=1, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, zero_sharding='none')

Traceback (most recent call last):
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/bin/fairseq-generate", line 11, in <module>
    sys.exit(cli_main())
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/generate.py", line 379, in cli_main
    main(args)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/generate.py", line 41, in main
    return _main(args, sys.stdout)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/generate.py", line 74, in _main
    task = tasks.setup_task(args)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/tasks/__init__.py", line 28, in setup_task
    return TASK_REGISTRY[task_cfg.task].setup_task(task_cfg, **kwargs)
  File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/tasks/translation.py", line 262, in setup_task
    "Could not infer language pair, please provide it explicitly"
Exception: Could not infer language pair, please provide it explicitly

On Sun, 24 Jan 2021 at 14:25, Sunil Kumar <fyns...@gmail.com> wrote:

sascam

unread,
Feb 12, 2021, 5:15:29 AM2/12/21
to fairseq Users, Sunil Kumar
What is the exact command to generate translations for a new test data using fairseq  trained models..?

Will it be possible to get the translated file as like a plain text file as like the source text file (not with the model values ), so that bleu score computation can be done ?

Looking forward for the commands

Thanking you

Sascam
Reply all
Reply to author
Forward
0 new messages