NMT training using fairseq

sas cam

unread,

Jan 5, 2021, 5:23:16 AM1/5/21

to fairseq Users

Dear All,

Could anybody please share the steps required to train a NMT model using fairseq ?

Expecting a help

Sascam

Sunil Kumar

unread,

Jan 5, 2021, 12:23:55 PM1/5/21

to fairseq Users

The first step is to get a parallel corpus, followed by tokenisation and then preprocessing to binary format for fairseq. Then training can be done followed by inference. It would be great if you can share anything specific you want to know as https://fairseq.readthedocs.io/en/latest/getting_started.html#training-a-new-model covers pretty much the entire pipeline.

Cheers,

--

sas cam

unread,

Jan 6, 2021, 5:24:15 AM1/6/21

to Sunil Kumar, fairseq Users

Thanks for the suggestion. I have gone through the fairseq documentation for training a new model. What I have understood is we have to run the scripts.

I was looking for precise training step commands (like moses training step commands are available for SMT ).

Could you please provide the list of step commands to train a new NMT model ?

Looking forward to hear from you

Thanking You

Sascam

--
You received this message because you are subscribed to a topic in the Google Groups "fairseq Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fairseq-users/QKPonaA3D4A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fairseq-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/0ebb2bf3-4388-46ce-8738-c4ddb88f028bn%40googlegroups.com.

Sunil Kumar

unread,

Jan 6, 2021, 9:20:27 AM1/6/21

to fairseq Users

Well, you can open the scripts and see the steps inside them, rather than just running them. If you are comfortable with basic shell scripting, you can easily follow the code. Thanks.

sas cam

unread,

Jan 18, 2021, 5:35:22 AM1/18/21

to Sunil Kumar, fairseq Users

Dear all,

When i try to do preprocess step, I am getting the following error. Could you please help me to correct this ?

"AttributeError: module 'torch' has no attribute 'BoolTensor'"

fairseq-preprocess --source-lang en --target-lang hi \ --trainpref $TEXT/IITB.en-hi --validpref $TEXT/IITB.en-hi --testpref $TEXT/IITB.en-hi \ --destdir data-bin/IITB.en-hi
Traceback (most recent call last):
File "/home/sreela/.local/bin/fairseq-preprocess", line 5, in <module>
from fairseq_cli.preprocess import cli_main
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq_cli/preprocess.py", line 18, in <module>
from fairseq import options, tasks, utils
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/__init__.py", line 19, in <module>
import fairseq.criterions # noqa
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/criterions/__init__.py", line 13, in <module>
from fairseq.criterions.fairseq_criterion import ( # noqa
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/criterions/fairseq_criterion.py", line 9, in <module>
from fairseq import metrics, utils
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/utils.py", line 20, in <module>
from fairseq.data import iterators
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/data/__init__.py", line 7, in <module>
from .dictionary import Dictionary, TruncatedDictionary
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/data/dictionary.py", line 13, in <module>
from fairseq.data import data_utils
File "/home/sreela/.local/lib/python3.6/site-packages/fairseq/data/data_utils.py", line 491, in <module>
def lengths_to_padding_mask(lens: torch.LongTensor) -> torch.BoolTensor:
AttributeError: module 'torch' has no attribute 'BoolTensor'

To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/db390d88-a13a-4ff7-85e8-9688868ab9e4n%40googlegroups.com.

Sunil Kumar

unread,

Jan 18, 2021, 12:47:24 PM1/18/21

to fairseq Users

Hi Sreela,

It would be helpful if you can share the PyTorch and Fairseq version, also the path of the repository if you are trying to replicate some scripts.

Thanks.

sascam

unread,

Jan 19, 2021, 12:31:03 AM1/19/21

to Sunil Kumar, fairseq Users

Hi,

fairseq version is 0.10.1

Pytorch is 1.7.1

CUDA is 10.0.130

I am not trying to replicate any scripts. I have executed fairseq-preprocess --source-lang en --target-lang hi \ --trainpref $TEXT/IITB.en-hi --validpref $TEXT/IITB.en-hi --testpref $TEXT/IITB.en-hi \ --destdir data-bin/IITB.en-hi

the fairseq is installed in /home/sreela/work/fairseq_temp/fairseq-master

Hope it will help

To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/0cd1f503-09ad-41ee-aea0-74e250d2e867n%40googlegroups.com.

Sunil Kumar

unread,

Jan 19, 2021, 10:32:50 PM1/19/21

to fairseq Users

Hi,

I am not able to replicate the error. The only difference is I have CUDA10.2.

Generally, the BoolTensor error is encountered when you are using an older version of PyTorch where you had ByteTensor for BoolTensor.

Troubleshooting:

a) Start the Python REPL, import torch, and see if you can declare and work with Bool Tensors.

Thanks.

sascam

unread,

Jan 21, 2021, 3:26:26 AM1/21/21

to Sunil Kumar, fairseq Users

Hi,

I created a new environment and started a fresh installation. I have installed a lower version of pytorch using-

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch

then installed fairseq using

python setup.py build develop

After that when I am executing

PYTHONPATH=/path/to/this/fairseq python -m fairseq_cli.train

I am getting the following Runtime error,

PYTHONPATH=/home/sreela/fairseq python -m fairseq_cli.train

Traceback (most recent call last):

File "/home/sreela/anaconda3/envs/NMT/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/sreela/anaconda3/envs/NMT/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sreela/fairseq/fairseq_cli/train.py", line 417, in <module>
cli_main()
File "/home/sreela/fairseq/fairseq_cli/train.py", line 404, in cli_main
args = options.parse_args_and_arch(parser, modify_parser=modify_parser)
File "/home/sreela/fairseq/fairseq/options.py", line 142, in parse_args_and_arch
raise RuntimeError()
RuntimeError

could you please help me to resolve the error ?

Looking forward to hear from you soon

To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/93f55d36-2557-4302-8f6e-c52193bcac28n%40googlegroups.com.

Sunil Kumar

unread,

Jan 24, 2021, 3:55:14 AM1/24/21

to fairseq Users

It looks like you are not providing any arguments. Instead, use the command line tools installed by fairseq directly

Something like this:

>> CUDA_VISIBLE_DEVICES=0,1,2,3

>> fairseq-train \

data-bin/papiamento_pap_en_bpe3000/ \

--log-format json \

--source-lang pap --target-lang en \

--arch transformer --share-all-embeddings \

--encoder-layers 4 --decoder-layers 4 \

--encoder-embed-dim 512 --decoder-embed-dim 512 \

--encoder-ffn-embed-dim 1024 --decoder-ffn-embed-dim 1024 \

--encoder-attention-heads 4 --decoder-attention-heads 4 \

--encoder-normalize-before --decoder-normalize-before \

--dropout 0.4 --attention-dropout 0.2 --relu-dropout 0.2 \

--weight-decay 0.0001 \

--label-smoothing 0.2 --criterion label_smoothed_cross_entropy \

--optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0 \

--lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-7 \

--lr 1e-3 --min-lr 1e-9 \

--max-tokens 4000 \

--max-epoch 100 --save-interval 10

I hope this helps. Before running train, though you have to run fairseq-preprocess with appropriate arguments.

Something like this:

fairseq-preprocess \

--source-lang $SRC --target-lang $TGT \

--trainpref $TMP/train.bpe --validpref $TMP/valid.bpe --testpref $TMP/test.bpe \

--destdir $DATABIN \

--joined-dictionary \

--tokenizer space \

--workers 4

sascam

unread,

Jan 27, 2021, 1:16:53 AM1/27/21

to Sunil Kumar, fairseq Users

Hi,

Thanks for the suggestion. I have done the preprocessing successfully. But, when I am trying to train, I am getting -ValueError: --optimizer is required!- error.

I have used - CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/IITB.en-hi --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 --arch fconv_iwslt_de_en --save-dir checkpoints/fconv

Could you please help me to correct the error ?

Error from the terminal

2021-01-27 11:37:28 | WARNING | fairseq.tasks.fairseq_task | 3 samples have invalid sizes and will be skipped, max_positions=(1022, 1022), first few sample ids=[1290557, 1445892, 1491388]

Traceback (most recent call last):

File "/home/sreela/anaconda3/envs/nmtPT-1.5/bin/fairseq-train", line 11, in <module>
sys.exit(cli_main())
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/train.py", line 352, in cli_main
distributed_utils.call_main(args, main)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/distributed_utils.py", line 301, in call_main
main(args, **kwargs)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/train.py", line 114, in main
disable_iterator_cache=task.has_sharded_data("train"),
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 216, in load_checkpoint
trainer.lr_step(epoch_itr.epoch)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/trainer.py", line 779, in lr_step
self.lr_scheduler.step(epoch, val_loss)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/trainer.py", line 193, in lr_scheduler
self._build_optimizer() # this will initialize self._lr_scheduler
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/trainer.py", line 219, in _build_optimizer
self._optimizer = optim.build_optimizer(self.args, params)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/optim/__init__.py", line 45, in build_optimizer
return _build_optimizer(optimizer_cfg, params, *extra_args, **extra_kwargs)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/registry.py", line 45, in build_x
raise ValueError("--{} is required!".format(registry_name))
ValueError: --optimizer is required!

To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/4dbbc5d1-4352-41a9-9a13-2618964319f1n%40googlegroups.com.

sascam

unread,

Jan 27, 2021, 2:17:23 AM1/27/21

to Sunil Kumar, fairseq Users

sorry it was an optimizer argument error. i resolved it.

sascam

unread,

Feb 5, 2021, 5:43:21 AM2/5/21

to Sunil Kumar, fairseq Users

HI,

How to give a test data for translating using fairseq-generate?

I have tried in two ways, In first, I have given the data-bin location and it was printing for days.

1) fairseq-generate data-bin/IITB.en-hi --path checkpoints/fconv/checkpoint_best.pt --batch-size 128 --beam 5 --skip-invalid-size-inputs-valid-test > translate.out

In second, I have given a directory path where I have kept a test data test.en. It is giving me an exception- Exception: Could not infer language pair, please provide it explicitly

2) fairseq-generate /home/sreela/NMT/corpus/test-en-hi/ --path checkpoints/fconv/checkpoint_best.pt --batch-size 128 --beam 5 --skip-invalid-size-inputs-valid-test

Could you please clarify, is it the right way to specify a test data to translate ?

view from terminal :

(nmtPT-1.5) sreela@dcsgpu:~/fairseq$ fairseq-generate /home/sreela/NMT/corpus/test-en-hi/ --path checkpoints/fconv/checkpoint_best.pt --batch-size 128 --beam 5 --skip-invalid-size-inputs-valid-test
2021-02-05 14:52:49 | INFO | fairseq_cli.generate | Namespace(all_gather_list_size=16384, batch_size=128, batch_size_valid=128, beam=5, bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', constraints=None, cpu=False, criterion='cross_entropy', curriculum=0, data='/home/sreela/NMT/corpus/test-en-hi/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoding_format=None, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, left_pad_source='True', left_pad_target='False', lenpen=1, lm_path=None, lm_weight=0.0, load_alignments=False, localsgd_frequency=3, log_format=None, log_interval=100, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_source_positions=1024, max_target_positions=1024, max_tokens=None, max_tokens_valid=None, memory_efficient_bf16=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', model_parallel_size=1, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, no_seed_provided=False, nprocs_per_node=1, num_batch_buckets=0, num_shards=1, num_workers=1, optimizer=None, path='checkpoints/fconv/checkpoint_best.pt', pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, prefix_size=0, print_alignment=False, print_step=False, profile=False, quantization_config_path=None, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, required_seq_len_multiple=1, results_path=None, retain_dropout=False, retain_dropout_modules=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, scoring='bleu', seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang=None, target_lang=None, task='translation', temperature=1.0, tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, unkpen=0, unnormalized=False, upsample_primary=1, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, zero_sharding='none')

Traceback (most recent call last):

File "/home/sreela/anaconda3/envs/nmtPT-1.5/bin/fairseq-generate", line 11, in <module>
sys.exit(cli_main())
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/generate.py", line 379, in cli_main
main(args)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/generate.py", line 41, in main
return _main(args, sys.stdout)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq_cli/generate.py", line 74, in _main
task = tasks.setup_task(args)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/tasks/__init__.py", line 28, in setup_task
return TASK_REGISTRY[task_cfg.task].setup_task(task_cfg, **kwargs)
File "/home/sreela/anaconda3/envs/nmtPT-1.5/lib/python3.6/site-packages/fairseq/tasks/translation.py", line 262, in setup_task
"Could not infer language pair, please provide it explicitly"
Exception: Could not infer language pair, please provide it explicitly

On Sun, 24 Jan 2021 at 14:25, Sunil Kumar <fyns...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/4dbbc5d1-4352-41a9-9a13-2618964319f1n%40googlegroups.com.

sascam

unread,

Feb 12, 2021, 5:15:29 AM2/12/21

to fairseq Users, Sunil Kumar

What is the exact command to generate translations for a new test data using fairseq trained models..?

Will it be possible to get the translated file as like a plain text file as like the source text file (not with the model values ), so that bleu score computation can be done ?

Looking forward for the commands

Thanking you

Sascam

Reply all

Reply to author

Forward