| Fp32 Img/sec Regular Batch Size |
||||
| 1 GPU | 2 GPU | 4 GPU | Batch Size | |
| ResNet50 | 314.87 | 590.3 | 952.8 | 64 |
| ResNet152 | 127.71 | 232.42 | 418.44 | 64 |
| InceptionV3 | 207.53 | 386.86 | 655.45 | 64 |
| InceptionV4 | 102.41 | 191.4 | 337.44 | 64 |
| VGG16 | 188.91 | 337.38 | 536.95 | 64 |
| NASNET | 160.42 | 280.07 | 510.15 | 64 |
| Alexnet | 4103.27 | 7814.04 | 10491.22 | 512 |
| Fp32 Img/sec Large Batch
Size |
||||
| 1 GPU | 2 GPU | 4 GPU | Batch Size | |
| ResNet50 | 322.66 | 622.41 | 1213.3 | 512 |
| ResNet152 | 137.12 | 249.58 | 452.77 | 256 |
| InceptionV3 | 216.27 | 412.75 | 716.47 | 256 |
| InceptionV4 | 105.2 | 201.49 | 345.79 | 256 |
| VGG16 | 166.55 | 316.46 | 617 | 512 |
| NASNET | 187.69 | 348.71 | 614 | 512 |
| Alexnet | 2825.61 | 4421.97 | 8482.39 | 8192 |
| Fp16 Img/sec Regular Batch
Size |
||||
| 1 GPU | 2 GPU | 4 GPU | Batch Size | |
| ResNet50 | 544.16 | 972.89 | 1565.18 | 64 |
| ResNet152 | 246.56 | 412.25 | 672.87 | 64 |
| InceptionV3 | 334.28 | 596.65 | 1029.24 | 64 |
| InceptionV4 | 178.41 | 327.89 | 540.52 | 64 |
| VGG16 | 347.01 | 570.53 | 637.97 | 64 |
| NASNET | 155.44 | 282.78 | 517.06 | 64 |
| Alexnet | 6013.64 | 11275.54 | 14960.97 | 512 |
| Fp16 Img/sec Large Batch
Size |
||||
| 1 GPU | 2 GPU | 4 GPU | Batch Size | |
| ResNet50 | 604.76 | 1184.52 | 2338.84 | 1024 |
| ResNet152 | 285.85 | 529.05 | 1062.13 | 512 |
| InceptionV3 | 391.3 | 754.94 | 1471.66 | 512 |
| InceptionV4 | 203.67 | 384.29 | 762.32 | 512 |
| VGG16 | 276.16 | 528.88 | 983.85 | 512 |
| NASNET | 196.52 | 367.6 | 726.85 | 512 |
| Alexnet | 5911.6 | 11456.11 | 21828.99 | 8192 |
| ALEXNET Img/sec | ||||
| 1 GPU | 2 GPU | 4 GPU | Batch Size | |
| Alexnet FP16 (Large Batch) | 5911.6 | 11456.11 | 21828.99 | 8192 |
| Alexnet FP16 (Normal Batch) | 6013.64 | 11275.54 | 14960.97 | 512 |
| Alexnet FP32 (Large Batch) | 2825.61 | 4421.97 | 8482.39 | 8192 |
| Alexnet FP32 (Normal Batch) | 4103.27 | 7814.04 | 10491.22 | 512 |
--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/50e30ed9-6fa5-4db5-9de6-28b38ea119ce%40tensorflow.org.
--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/e3dfc7c6-bfd2-4fa0-90bf-0426e0de83d1%40tensorflow.org.
Nice, that puts the RTX pretty close to the V100-SMX2. I assume RTX 8000 is a fraction of the cost as well as more workstation friendly. Thank you for running the command tweaks.Cool data points.
To unsubscribe from this group and stop receiving emails from it, send an email to dis...@tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/5d2af034-9925-4264-87b9-8e88508bf42d%40tensorflow.org.
It is zero indexed so what this does is make only the first GPU visible and only that GPU. That ENV-VAR is not 100% needed. We found that running on a single GPU on a multiple GPU machine (specifically DGX-1 setups) had a very small perf penalty when TF or maybe CUDA or the combination could see all of the GPUs, we never figured it out because it is rare situation and the perf hit is not large.I almost removed the ENV_VAR before sending it, but that is exactly what I run and I was in a hurry.Toby
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/5d2af034-9925-4264-87b9-8e88508bf42d%40tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/8d7c32b0-f34e-4235-aa04-b0f20ad6877c%40tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/CAKpuZpnfZ79x3ZN8L6vPVgJkeMg01FEkA7o6-%2BN-CN%3D1fbyO1w%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to dis...@tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/6f46213c-c8a3-4583-8915-ca35f157e327%40tensorflow.org.
Jon,You mind not find this super professional, but for now we are just checking the memory usage of the process via a side thread. We had a lot of OOM in TF 2.0,as in ResNet50 using 360GB of system memory after 20 epochs. A simple total memory usage check was good enough.The tooling we created to run tests externally is called perfzero. It is meant to be very simple and lightweight. It is far from magical, but has been super useful for my needs and I hope we expand it. We had a goal to add very basic always on profiling to TF, basic memory usage and a few other things directly from the TF allocator, but some team shuffling I need to figure out has slowed things down.I said a lot and I wish I was giving you more exact answers. Good luck.Toby
On Mon, Apr 1, 2019 at 9:43 AM Jon Wang <jon.w...@gmail.com> wrote:
Thanks Toby,What about CPU memory then? I would guess similar consumption on CPU cluster with same model and batch size. I would like to verify I have the right idea how much memory consumption to be expected for a certain batch size.Jinzhen
On Apr 1, 2019, 12:26 -0400, Toby Boyd <toby...@google.com>, wrote:
I do not have a good answer for you on GPU memory. The memory tests we do is increase batch-size until OOM, this does not give memory usage it only verifies we are not regressing. I want to run a test that validates exact memory but we currently do not have it instrumented or i would send you the commands or info. I suspect with the right VLOG settings you could figure it out, but I sadly do not have a good answer. It is a priority as we have been bitten by small and even large regressions.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/8d7c32b0-f34e-4235-aa04-b0f20ad6877c%40tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/c964bf5c-6a4b-4759-8ee4-41f1ecb2e4ba%40tensorflow.org.