Tensorflow every iteration of for-loop gets slower and slower

kshen...@gmail.com

unread,

Nov 18, 2017, 8:09:39 PM11/18/17

to Discuss

Hi,

I'm doing some evaluation for my algorithm where I'm comparing some generated images to a ground truth image by computing 3 different types of loss between the two images. The logic of the code is:

1. I loop over all ground truth images

2. For each ground truth image, I loop over the relevant generated images and check each against the ground truth image by computing 3 losses

The running time of the code is increasing for each iteration as seen below. This makes it so the code can't finish running in a reasonable amount of time. What could be causing this?

Code is included below. Also I'm using the Edward library for Tensorflow, if that's relevant. I create my session using the following command:

sess = ed.get_session()

Help is much appreciated!

Thanks in advance,

Kevin

Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [01:36<00:00,  2.53s/it]
---------- Summary Image 001 ------------
Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [01:44<00:00,  2.61s/it]
---------- Summary Image 002 ------------
Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [01:57<00:00,  3.59s/it]
---------- Summary Image 003 ------------
Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [02:16<00:00,  3.34s/it]
---------- Summary Image 004 ------------
Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [02:25<00:00,  3.56s/it]
---------- Summary Image 005 ------------
Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [02:45<00:00,  4.00s/it]
---------- Summary Image 006 ------------
Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [02:54<00:00,  4.19s/it]
---------- Summary Image 007 ------------
Starting evaluation...
100%|█████████████████████████████████████████████| 40/40 [03:11<00:00,  4.58s/it]
---------- Summary Image 008 ------------
Starting evaluation...
100%|████████████████████████████████████████████| 40/40 [03:26<00:00,  5.02s/it]
---------- Summary Image 009 ------------
Starting evaluation...
100%|████████████████████████████████████████████| 40/40 [03:38<00:00,  5.58s/it]
---------- Summary Image 010 ------------
Starting evaluation...
100%|████████████████████████████████████████████| 40/40 [03:51<00:00,  5.77s/it]

for i in range(inference_batch_size):
    compare_vae_hmc_loss(model.decode_op, model.encode_op, model.discriminator_l_op,
                               x_ad[i:i+1], samples_to_check[:, i, :], config)

def compare_vae_hmc_loss(P, Q, DiscL, x_gt, samples_to_check, config):
    print ("Starting evaluation...")

    x_samples_to_check = ...

    for i, sample in enumerate(tqdm(x_samples_to_check)):

        for j in range(sample_to_vis):
            plot_save(x_samples_to_check[j], './out/{}_mcmc_sample_{}.png'.format(img_num, j + 1))

        avg_img = np.mean(x_samples_to_check, axis=0)
        plot_save(avg_img, './out/{}_mcmcMean.png'.format(img_num))

        r_loss = recon_loss(x_gt, sample)
        l_loss = l2_loss(x_gt, sample)
        lat_loss = l_latent_loss(l_th_x_gt, l_th_layer_samples[i:i+1])
        total_recon_loss += r_loss
        total_l2_loss += l_loss
        total_latent_loss += lat_loss

        if r_loss < best_recon_loss:
            best_recon_sample = sample
            best_recon_loss = r_loss

        if l_loss < best_l2_loss:
            best_l2_sample = sample
            best_l2_loss = l_loss

        if lat_loss < best_latent_loss:
            best_latent_sample = sample
            best_latent_loss = lat_loss

def l2_loss(x_gt, x_hmc):
    if jernej_Q_P:
        return tf.norm(x_gt - x_hmc).eval()
    else:
        return tf.norm(x_gt-x_hmc).eval()


def recon_loss(x_gt, x_hmc):
    if jernej_Q_P:
        return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(logits=x_hmc, labels=x_gt), 1).eval()
    else:
        return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(logits=x_hmc[1], labels=x_gt), 1).eval()


def l_latent_loss(l_th_x_gt, l_th_x_hmc):
    return tf.norm(l_th_x_gt - l_th_x_hmc).eval()

Dirk Toewe

unread,

Nov 19, 2017, 12:40:15 AM11/19/17

to kshen...@gmail.com, Discuss

Hi Kevin,

if I am not mistaken, it's the usual suspect: Every time You call a Tensorflow operation like tf.norm, You are not evaluating a norm. You are creating a new Tensorflow computation. Tensorflow consists of the two phases: first You create a computation and then You execute it using sess.run() or the more tricky, hidden .eval(). In Your case You haven't separated those two phases cleanly. For each pair of samples You create a new computation. Those computations are added to a so called (computation) graph and will therefore not even be garbage collected. There are multiple ways to fix this:

1) If You're unfamiliar with Tensorflow, the easiest way might be to convert Your norm functions into function-like tensorflow computations using tf.placeholder()

a = tf.placeholder(dtype=tf.float64)

b = tf.placeholder(dtype=tf.float64)

norm_func = tf.norm(a,b)

with tf.Session() as sess:

for sample_i,sample_j in sample_pairs:

sess.run(norm_func, feed_dict={

a: sample_i,

b: sample_j

})

2) If You're familar with NumPy style broadcasting You can use it to compare the whole stack of samples at once.

3) You could read about Tensorflow control flow operations and convert the Python loops and if statements to Tensorflow computations. It might take a while to get the hang of it, but it can be really rewarding, since You will understand Tensorflow better once You understand control flow.

4) There is also something called eager mode in Tensorflow which might be more intuitive to use. But I can't recommend it since I am unfamiliar with it.

Generally speaking, all calls beginning with "tf." should happen before and outside of a tf.Session.

Hope this helps

Dirk

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/836b4e69-1323-462b-8be3-5a00851aaa3a%40tensorflow.org.

kshen...@gmail.com

unread,

Nov 19, 2017, 11:08:31 AM11/19/17

to Discuss, kshen...@gmail.com

Thank you for your replies. You guys were right, tf.norm was growing the computation graph at each iteration. Thanks Dirk for your recommended solutions. I thought about using the placeholder approach you suggested but I ended up doing all the tf.norm computations beforehand in tensor operations, then passing the numpy array results to the for-loop for evaluation. This was a bit more work but it seems to work very well!