Beyond English-Centric Multilingual Machine Translation Preprocessing process

62 views
Skip to first unread message

dhamaraiselvi sekar

unread,
Oct 26, 2023, 3:33:51 AM10/26/23
to fairse...@googlegroups.com
 python /home/dhamaraiselvi/fairseq/examples/m2m_100/process_data/clean_histogram.py --src --tgt --src-file /home/dhamaraiselvi/source/file --tgt-file /home/dhamaraiselvi/output/file --src-output-file source_output. --tgt-output-file target_output. --histograms /home/dhamaraiselvi/histograms
usage: clean_histogram.py [-h] [--src SRC] [--tgt TGT] [--src-file SRC_FILE] [--tgt-file TGT_FILE] [--src-output-file SRC_OUTPUT_FILE] [--tgt-output-file TGT_OUTPUT_FILE] [--threshold THRESHOLD]
                          [--threshold-character THRESHOLD_CHARACTER] [--histograms HISTOGRAMS]
clean_histogram.py: error: argument --src: expected one argument
+ wget https://dl.fbaipublicfiles.com/m2m_100/spm.128k.model
--2023-10-26 12:44:06--  https://dl.fbaipublicfiles.com/m2m_100/spm.128k.model
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 18.161.229.119, 18.161.229.36, 18.161.229.68, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|18.161.229.119|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2423393 (2.3M) [application/octet-stream]
Saving to: ‘spm.128k.model’

spm.128k.model                                     100%[================================================================================================================>]   2.31M  1.64MB/s    in 1.4s    

2023-10-26 12:44:09 (1.64 MB/s) - ‘spm.128k.model’ saved [2423393/2423393]

+ python /home/dhamaraiselvi/fairseq/scripts/spm_encode.py --model spm.128k.model --output_format=piece --inputs=/home/dhamaraiselvi/fairseq/examples/translation/input/2015CHIKShampooTelevisionCommercialCHIKSpinStyleTamil.txt --outputs=/home/dhamaraiselvi/fairseq/examples/translation/output/2015CHIKShampooTelevisionCommercialCHIKSpinStyleTamil.txt
skipped 0 empty lines
filtered 0 lines
+ perl mosesdecoder/scripts/training/clean-corpus-n.perl --ratio 3 /home/dhamaraiselvi/training/data/train.spm.- /home/dhamaraiselvi/output/directory/train.spm.- 1 250
Can't open perl script "mosesdecoder/scripts/training/clean-corpus-n.perl": No such file or directory
+ wget https://dl.fbaipublicfiles.com/m2m_100/data_dict.128k.txt
--2023-10-26 12:44:09--  https://dl.fbaipublicfiles.com/m2m_100/data_dict.128k.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 18.161.229.119, 18.161.229.68, 18.161.229.24, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|18.161.229.119|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1991068 (1.9M) [application/octet-stream]
Saving to: ‘data_dict.128k.txt’

data_dict.128k.txt                                 100%[================================================================================================================>]   1.90M  1.75MB/s    in 1.1s    

2023-10-26 12:44:11 (1.75 MB/s) - ‘data_dict.128k.txt’ saved [1991068/1991068]

+ fairseq-preprocess --source-lang --target-lang --testpref spm.. --thresholdsrc 0 --thresholdtgt 0 --destdir data_bin --srcdict data_dict.128k.txt --tgtdict data_dict.128k.txt
Traceback (most recent call last):
  File "/home/dhamaraiselvi/anaconda3/bin/fairseq-preprocess", line 5, in <module>
    from fairseq_cli.preprocess import cli_main
  File "/home/dhamaraiselvi/fairseq/fairseq_cli/preprocess.py", line 18, in <module>
    from fairseq import options, tasks, utils
  File "/home/dhamaraiselvi/fairseq/fairseq/__init__.py", line 29, in <module>
    from fairseq.dataclass.initialize import hydra_init
  File "/home/dhamaraiselvi/fairseq/fairseq/dataclass/initialize.py", line 8, in <module>
    from hydra.core.config_store import ConfigStore
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/site-packages/hydra/__init__.py", line 5, in <module>
    from hydra import utils
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/site-packages/hydra/utils.py", line 10, in <module>
    from hydra._internal.utils import (
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/site-packages/hydra/_internal/utils.py", line 21, in <module>
    from hydra.core.utils import get_valid_filename, split_config_path
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/site-packages/hydra/core/utils.py", line 19, in <module>
    from hydra.core.hydra_config import HydraConfig
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/site-packages/hydra/core/hydra_config.py", line 6, in <module>
    from hydra.conf import HydraConf
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/site-packages/hydra/conf/__init__.py", line 62, in <module>
    class JobConf:
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/site-packages/hydra/conf/__init__.py", line 87, in JobConf
    @dataclass
     ^^^^^^^^^
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/dataclasses.py", line 1230, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/dataclasses.py", line 1220, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/dataclasses.py", line 958, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dhamaraiselvi/anaconda3/lib/python3.11/dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'hydra.conf.JobConf.JobConfig.OverrideDirname'> for field override_dirname is not allowed: use default_facto
Thanks & Regards,
Dhamaraiselvi Sekar

Cavin Infotech Private Limited
Data Scientist
M49, Cactus Corporate Coworking, #173, 8th Floor, Block B, Tecci Park, OMR, Sholinganallur, Chennai 600 119
Email: dhamaraiselvi.s...@cavininfotech.com

Reply all
Reply to author
Forward
0 new messages