Hi Chris and All:
I got couples of difficulties when I am running swivel model on larger corpus or more GPUs. I would appreciate your help and comments on these.
1. I tried to run distributed swivel using 4 GPUs with 1PS and 8 workers on text8 dataset vocabulary size ~71,290 unique frequent words.
it run much slower than using 2 GPUs 2 times slower.
2. When I tried to run swivel on datasets which has ~2.7 million unique frequent tokens of 3 billion tokens.
The prep.py stop with error too many file opens. I increased the shard size then prep.py produced list of tmp files e.g shard-001-015.tmp ~240MB and it terminated without error messages and it did not produced all file needed for swivel.py
I also tried to run glove_to_shards.py to create tf.Record format from GloVe co-occurrence matrix which is produced from my ~2.7 million vocabulary
glove_to_shards.py also terminated after producing temporary files. It did not produce error message why it stopped.
Also, in swivel.py code does not have option to read shards.recs
Chris, Do you have version which read shards.recs to train the swivel instead of input files produced from prep.py?
Thanks for your time.
Best Regards,
Phuong