Problems with GPU support

Pavlos Triantaris

unread,

Mar 23, 2021, 1:02:51 PM3/23/21

to tensorflow-compression

Hello.

Ever since the 17th of March, I have been experiencing difficulties with tensorflow-compression on Colab, even though I did not change anything in my code, which used to work without problems.

I managed to resolve the import problem as per Issue #74 on Github, but even so, only CPU becomes available as hardware for training if I install TFC.

For reference, here is a Colab notebook reproducing the problem:

https://colab.research.google.com/drive/1z_pReD8l0hlStNh-sgX5q1Z9l8-iEQQx?usp=sharing

Thank you in advance for your help.

Pavlos Triantaris

unread,

Mar 23, 2021, 1:11:15 PM3/23/21

to tensorflow-compression

I have changed the notebook privacy settings to public. Sorry I forgot to do so from the get-go.

Johannes Ballé

unread,

Mar 23, 2021, 1:18:07 PM3/23/21

to tensorflow-compression

Hi Pavlos,

could you try the same instructions with tf-nightly-gpu instead of tf-nightly, and let us know how that goes?

Thanks!

Johannes

--
You received this message because you are subscribed to the Google Groups "tensorflow-compression" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tensorflow-compre...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/77d57b15-5493-4570-aa34-3652b6dc77c0n%40googlegroups.com.

Pavlos Triantaris

unread,

Mar 23, 2021, 1:29:33 PM3/23/21

to tensorflow-compression

I tried what you suggested and it still recognises CPU only.

Additionally, during importing TFC I get the following error message:

Which I think is the same error message which I started getting a week ago, before changing to tensorflow-nightly.

The changes detailed above are saved in the notebook for reference.

Thanks.

Pavlos

Johannes Ballé

unread,

Mar 23, 2021, 1:42:29 PM3/23/21

to tensorflow-compression

I see. Unfortunately, the tf-nightly dependency of the TFC package seems to pull it back in, and then it is being used instead of the -gpu variant :(

We'll have to try it this way:

!pip uninstall -y tensorflow tf-nightly

!pip install tf-nightly-gpu==2.5.0.dev20210312 tensorflow_probability~=0.12.1 scipy~=1.5

!pip install --no-deps tensorflow-compression==2.1

Then restart the Colab kernel, and see if it works.

To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/2cf27202-b3fa-414d-b52d-3cf95b21d4cen%40googlegroups.com.

Pavlos Triantaris

unread,

Mar 23, 2021, 2:09:03 PM3/23/21

to tensorflow-compression

I tried running the three commands which you suggested, then restarting the runtime, and running the rest of the notebook. This fixes the import issue but only CPU is available. (Changes saved in notebook).

Johannes Ballé

unread,

Mar 23, 2021, 4:23:03 PM3/23/21

to tensorflow-compression

I just tried to work around this problem myself, but unfortunately it looks like Colab uses a special version of TF that is not consistent with the way tf-nightly-gpu/tensorflow pip packages access the GPU. So it looks like we can't support GPU in Colab for the time being. Once TF 2.5 is released, this should be resolved (projected date AFAIK is in April).

The only workaround I can recommend for now is to use TFC 1.3 and TF 1.15. You can switch to TF1 in Colab with the %tensorflow_version command, and then install TFC 1.3 using pip. Of course that doesn't give you the new implementation of the entropy models, unfortunately.

Johannes.

To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/7b393c76-1259-4d99-b0ae-43c608387812n%40googlegroups.com.

Pavlos Triantaris

unread,

Mar 23, 2021, 4:43:56 PM3/23/21

to tensorflow-compression

Right, so I have basically replaced the installation commands with:

!pip uninstall -y tensorflow

!pip install tensorflow-gpu==1.15

!pip install tensorflow-compression==1.3

An error is still present in the installation cell outputs, but nothing else happens. However, the GPU is still not available.

Could you have another look at the code and help me understand what might be going wrong, please?

Johannes Ballé

unread,

Mar 23, 2021, 4:53:09 PM3/23/21

to tensorflow-compression

Please do the following:

- Factory reset your runtime (from the menu).

- Then run the following commands:

%tensorflow_version 1.x

!pip install tensorflow-compression==1.3

This should (hopefully) do it.

To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/be911d71-6adc-417c-80e4-1d7d099b558en%40googlegroups.com.

Pavlos Triantaris

unread,

Mar 23, 2021, 5:30:14 PM3/23/21

to tensorflow-compression

Thanks for the suggestion. Unfortunately, I am still unable to resolve the problem, even with this modification, and a number of other possible ideas that I tried.

Pavlos Triantaris

unread,

Mar 23, 2021, 5:32:44 PM3/23/21

to tensorflow-compression

I found the bug. For some reason, the runtime had switched itself back to "No accelerator" for some reason.

Thanks for the help.

Johannes Ballé

unread,

Mar 23, 2021, 6:45:07 PM3/23/21

to tensorflow-compression

Good to hear!

To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/3d75a785-b646-40f1-a3ef-6c0e2e47c7f3n%40googlegroups.com.

Pavlos Triantaris

unread,

Mar 23, 2021, 6:58:11 PM3/23/21

to tensorflow-compression

If I may ask for one more clarification, please.

I am using the old entropy_bottleneck layer, now as before, and I filter through it a 1-dimensional input (i.e. each batch runs an input of [N,128], where N is the minibatch size -- the number of elements of the dataset which is passed to the neural network in a single step).

If I have understood correctly, then, the command:

bits = tf.reduce_sum(tf.log(likelihoods), axis=1) / -np.log(2)

Returns a vector of dimensions Nx1, with the element [j,1] being the number of bits passed through the entropy bottleneck for the j-th element of the minibatch.

Have I understood this right?

Johannes Ballé

unread,

Mar 23, 2021, 9:27:11 PM3/23/21

to tensorflow-compression

Hi Pavlos,

almost – `bits` should a vector of length N, i.e. its shape would evaluate to [N] rather than [N, 1].

Hope this helps.

Johannes

To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/4fd9d056-5a00-46cd-975a-e1e127ae170fn%40googlegroups.com.

Reply all

Reply to author

Forward