Guidance on using FAIRseq for seq2seq tasks

438 views
Skip to first unread message

neel g

unread,
Aug 22, 2020, 6:40:58 AM8/22/20
to fairseq Users

Hi, I want to use FairSeq for a custom seq2seq task but had a few doubts using it:-

  1. What is the format required for the dataset? I have a .csv file which has input corresponding to outputs. However, I found no mention in the docs that explicitly specify on what format data is going to be (I have never trained a model from command-line so am more used to customization). My plan was to split it into 2 separate files - one will have the inputs and the other will have the corresponding output w.r.t to the lines, but I don't see any parameters that allow training/preprocessing in such way. Can anyone clarify how exactly can I accomplish this? [BTW I have 500Mb file of data which has 10 million lines/data points]. Does it require tokenization on the fly?

  2. Seeing the fairseq-preprocess docs, It seems that fairseq requires no tokenization. Now the HuggingFace tokenization library does not use GPU and thus takes a lot of time along with a LOT of memory. So does the --tokenizer or --bpe flag in the preprocessing step use GPU resources OR does it still utilize the standard CPU+RAM combination? I did try to train using the HF lib but would prefer GPU/TPU for the task.

  3. Lastly, in the TPU training section the --tpu flag has this doc:- "use TPU instead of CUDA". Is there any way to set which TPU strategy to use? the Distributed Training section only offers options for GPU training and not a way to set TPU strategy.

Lastly, I would be extremely pleased if someone can specify what GPU time it took to train the FairSeq model. Please keep such refs. in the docs themselves so that people can at least have an idea of how much resources it would take to train it to a significant level. I am planning at using 10 hours of TPU v2 (180 TFLOP's) OR 3x v3 TPU's (480 TFLOPS) for 6 total TPUs hours. How much training I can accomplish with that?

Myle Ott

unread,
Aug 22, 2020, 7:31:35 AM8/22/20
to neel g, fairseq Users
Please take a look at this paper for a sense of how long it will take to train models: 

And here’s the corresponding README to reproduce results from that paper: 
https://github.com/pytorch/fairseq/tree/master/examples/scaling_nmt. You can look at the data format there to see what you need to do. Fairseq doesn’t do any kind of tokenization or BPE, so that must be done on your own as a preprocessing step.

TPU support is experimental and I haven’t done much benchmarking yet. You’re welcome to try it, but no promises on how fast it will be.

-Myle

--


You received this message because you are subscribed to the Google Groups "fairseq Users" group.


To unsubscribe from this group and stop receiving emails from it, send an email to fairseq-user...@googlegroups.com.


To view this discussion on the web visit https://groups.google.com/d/msgid/fairseq-users/f1466be0-e0e7-410e-bb10-1d2c2068bc3bn%40googlegroups.com.


Reply all
Reply to author
Forward
0 new messages