Swivel ran using 4 GPUs on text8 low accuracy

thuphu...@gmail.com

unread,

May 3, 2017, 2:33:04 AM5/3/17

to Swivel Embeddings

Hi, I am running Swivel using 4 GPUs on text8 corpus and Swivel does not give me good embedding vector

I tried to tune the learning rate and number of training epochs but the model still give me the low accuracy

Any help would be really appreciated.

I changed the learning_rate =0.15 and got the below results

./wordsim.py --vocab vocab.txt --embeddings vecs.bin *.ws.tab

0.328 ws353rel.ws.tab

0.266 ws353sim.ws.tab

./analogy --vocab vocab.txt --embeddings vecs.bin q_data/*.txt

0.026 q_data/capital-common-countries.txt

0.004 q_data/capital-world.txt

0.013 q_data/city-in-state.txt

0.000 q_data/currency.txt

0.014 q_data/family.txt

0.000 q_data/gram1-adjective-to-adverb.txt

0.000 q_data/gram2-opposite.txt

0.001 q_data/gram3-comparative.txt

0.001 q_data/gram4-superlative.txt

0.000 q_data/gram5-present-participle.txt

0.012 q_data/gram6-nationality-adjective.txt

0.000 q_data/gram7-past-tense.txt

0.002 q_data/gram8-plural.txt

Thanks for your time.

Phuong

Phuong Nguyen

unread,

May 3, 2017, 1:42:49 PM5/3/17

to Swivel Embeddings

I think I ran the Swivel with wrong tuning parameters? please help any one?

For more information I ran the GloVe on the same text8 corpus and evaluated the word analogy using the same set of questions.

GloVe gave me the below accuracy:

capital-common-countries.txt:

ACCURACY TOP1: 59.49% (301/506)

capital-world.txt:

ACCURACY TOP1: 27.05% (964/3564)

currency.txt:

ACCURACY TOP1: 5.03% (30/596)

city-in-state.txt:

ACCURACY TOP1: 29.14% (679/2330)

family.txt:

ACCURACY TOP1: 40.95% (172/420)

gram1-adjective-to-adverb.txt:

ACCURACY TOP1: 5.14% (51/992)

gram2-opposite.txt:

ACCURACY TOP1: 3.57% (27/756)

gram3-comparative.txt:

ACCURACY TOP1: 23.35% (311/1332)

gram4-superlative.txt:

ACCURACY TOP1: 8.06% (80/992)

gram5-present-participle.txt:

ACCURACY TOP1: 14.39% (152/1056)

gram6-nationality-adjective.txt:

ACCURACY TOP1: 57.26% (871/1521)

gram7-past-tense.txt:

ACCURACY TOP1: 15.58% (243/1560)

gram8-plural.txt:

ACCURACY TOP1: 25.90% (345/1332)

Chris Waterson

unread,

May 3, 2017, 1:44:48 PM5/3/17

to Phuong Nguyen, Swivel Embeddings

Hi Phuong. There were some recent changes that were submitted to support multiple GPUs that may be causing problems. I'm investigating. Thanks for your patience.

--
You received this message because you are subscribed to the Google Groups "Swivel Embeddings" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddings+unsubscribe@googlegroups.com.
To post to this group, send email to swivel-embeddings@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swivel-embeddings/c28d2365-0f5d-4a54-b15e-09c3bcd69478%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Phuong Nguyen

unread,

May 3, 2017, 8:33:08 PM5/3/17

to Swivel Embeddings

Thank you very much for your time looking into the problems. I am looking forward to hear the good news.

To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddi...@googlegroups.com.
To post to this group, send email to swivel-e...@googlegroups.com.

Chris Waterson

unread,

May 4, 2017, 2:28:25 AM5/4/17

to Phuong Nguyen, Swivel Embeddings

Hi Phong, I've been meaning to refactor swivel.py to use the distributed TF API for a long time. This was a good excuse to do so. Please take a look at this pull request and let me know if it works for you:

https://github.com/tensorflow/models/pull/1441

This should make it easy to run Swivel on a single machine with multiple GPUs, or across a cluster of machines.

I was able to test this on a 2 GPU machine and achieved the following results after 40 epochs on text8:

~/src/models/swivel $ python wordsim.py --embeddings ~/tmp/swivel/out/vecs.bin --vocab ~/tmp/swivel/in/row_vocab.txt ~/tmp/swivel/eval/*.ws.tab

0.660 /home/waterson/tmp/swivel/eval/men.ws.tab

0.660 /home/waterson/tmp/swivel/eval/mturk.ws.tab

0.283 /home/waterson/tmp/swivel/eval/rarewords.ws.tab

0.258 /home/waterson/tmp/swivel/eval/simlex999.ws.tab

0.614 /home/waterson/tmp/swivel/eval/ws353rel.ws.tab

0.716 /home/waterson/tmp/swivel/eval/ws353sim.ws.tab

~/src/models/swivel $ ./analogy --embeddings ~/tmp/swivel/out/vecs.bin --vocab ~/tmp/swivel/in/row_vocab.txt ~/tmp/swivel/eval/*.an.tab

0.277 /home/waterson/tmp/swivel/eval/mikolov.an.tab

0.131 /home/waterson/tmp/swivel/eval/msr.an.tab

N.B., I used RMSProp instead of AdaGrad, and eliminated some of the hyper-parameters. Since we posted the article about Swivel, it's become clear that some of the hyper-parameter settings were really just ways of making up for AdaGrad's clamping behavior.

To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddings+unsubscribe@googlegroups.com.
To post to this group, send email to swivel-embeddings@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swivel-embeddings/96403054-9cac-4014-8479-9b450dc5eab0%40googlegroups.com.

Phuong Nguyen

unread,

May 5, 2017, 11:45:35 AM5/5/17

to Swivel Embeddings

Chris,

I would like to thank you and your team very much for sharing your new distributed version of Swivel.

I was be able to run Swivel using 2 GPUs evaluated on the text8 dataset and got a high accuracy.

I had to backward your new hyper-parameters to the old hyper-parameters :) since I do not have tf.contrib.training.HParams on the Tensorflow 1.0.1 come with IBM PowerAI. I used IBM Minsky node which has 4 GPUs for my test. The comparison is still orange to apple since Swivel and GloVe settings do not produce exactly same co-occurent matrix nor the exactly the same vocabulary.

We are happy with the results and would like to evaluate Swivel on 4 GPUs, and then 2 Nodes with multiple GPUs and hopefully we will be able to use Swivel to produce word embedding for our datatset.

Text8 corpus analogy questions	Swivel accuracy %	GloVe accyracy %
capital-common-countries.txt	89.9	59.49
capital-world.txt	51.6	27.05
city-in-state.txt	53.5	29.14
currency.txt	6.2	5.03
family.txt	34.5	40.95
gram1-adjective-to-adverb.txt	4.6	5.14
gram2-opposite.txt	4.9	3.57
gram3-comparative.txt	31.1	23.35
gram4-superlative.txt	7.7	8.06
gram5-present-participle.txt	15.6	14.39
gram6-nationality-adjective.txt	77.3	57.26
gram7-past-tense.txt	18	15.28
gram8-plural.txt	43.7	25.9

Chris Waterson

unread,

May 5, 2017, 11:54:34 AM5/5/17

to Phuong Nguyen, Swivel Embeddings

On Fri, May 5, 2017 at 8:45 AM, Phuong Nguyen <thuphu...@gmail.com> wrote:

Chris,

I would like to thank you and your team very much for sharing your new distributed version of Swivel.

I was be able to run Swivel using 2 GPUs evaluated on the text8 dataset and got a high accuracy.

That's great news... I'm glad it worked for you! Thanks for letting me know.

I had to backward your new hyper-parameters to the old hyper-parameters :) since I do not have tf.contrib.training.HParams on the Tensorflow 1.0.1 come with IBM PowerAI. I used IBM Minsky node which has 4 GPUs for my test.

Ah... well that change was a bit gratuitous. I will back it out so that it's compatible with TF 1.0 before merging the change to head.

The comparison is still orange to apple since Swivel and GloVe settings do not produce exactly same co-occurent matrix nor the exactly the same vocabulary.

FWIW, there is a utility "glove_to_shards.py" which will convert a the binary Glove co-occurrence matrix to the tf.Record format if you're interested in doing a more careful comparison.

In my experience, Glove works very well so long as the co-occurrence matrix is dense enough. Swivel and word2vec start to shine as the matrix gets sparser and sparser.

To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddings+unsubscribe@googlegroups.com.
To post to this group, send email to swivel-embeddings@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swivel-embeddings/1f1f2617-9be7-4b6e-b154-8282fff9021a%40googlegroups.com.

Phuong Nguyen

unread,

May 5, 2017, 12:30:17 PM5/5/17

to Swivel Embeddings

Chris,

I am so glad that Swivel worked too!

Thank for sharing your experience with dense and spare co-occurrence matrix. One of the reason that we would like to try Swivel is we want to find the solution which works for spare co-occurrence matrix. I found your paper and you showed the Swivel works well for common and rare features.