I've been a big fan of VGG-style networks for a while, but I recently decided to try out a SqueezeNet-style network for classification and see if I could reduce the size of my nets and get a nice speed boost. As promised, the networks are significantly smaller, while still maintaining very good accuracy. In my case, I was able to reduce my networks from ~130MB down to ~5MB. I'm not doing any additional compression on the networks (like pruning, quantization, etc.). All of this was implemented in Torch.I expected that with such a large reduction in the number of neural net parameters (25x reduction), the nets would run significantly faster than my VGG-style nets. Disappointingly, I'm only getting a 33% speed boost. Running on an AWS K520 GPU, inference time for my old nets was ~7.5ms for a 128x128 image. For my new nets, inference time is ~5ms. Maybe this isn't surprising; the number of layers in my old and new networks are pretty similar and the GPU can't parallelize computation over different layers. (On the CPU, the SqueezeNet-style network runs 3x faster than the VGG-style network.) So maybe for increased speed (w/ GPU), I really need to go wider instead of deeper, but are there any tricks to improve the speed of my SqueezeNet-style network within the Torch framework (or otherwise)?
--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+unsubscribe@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.
Usually for GPU the batch size has to be larger than 1 image.
Try a batch size of 256 inputs, and you'll see a wider gap in performance.
On Wed, Aug 17, 2016 at 4:03 PM, Alexander Weiss via torch7 <torch7+APn2wQcsaR8bcOeppHpjYu30PfpJTwn3luLMBUUyH0EVaRyRp1q7UJHGY@googlegroups.com> wrote:
I've been a big fan of VGG-style networks for a while, but I recently decided to try out a SqueezeNet-style network for classification and see if I could reduce the size of my nets and get a nice speed boost. As promised, the networks are significantly smaller, while still maintaining very good accuracy. In my case, I was able to reduce my networks from ~130MB down to ~5MB. I'm not doing any additional compression on the networks (like pruning, quantization, etc.). All of this was implemented in Torch.I expected that with such a large reduction in the number of neural net parameters (25x reduction), the nets would run significantly faster than my VGG-style nets. Disappointingly, I'm only getting a 33% speed boost. Running on an AWS K520 GPU, inference time for my old nets was ~7.5ms for a 128x128 image. For my new nets, inference time is ~5ms. Maybe this isn't surprising; the number of layers in my old and new networks are pretty similar and the GPU can't parallelize computation over different layers. (On the CPU, the SqueezeNet-style network runs 3x faster than the VGG-style network.) So maybe for increased speed (w/ GPU), I really need to go wider instead of deeper, but are there any tricks to improve the speed of my SqueezeNet-style network within the Torch framework (or otherwise)?
--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
My GPU doesn't have enough memory to batch 256 images. However, the inference time of the SqueezeNet-style network appears to scale linearly with batch size up to 64 images. Interestingly, it seems like I can push through larger images (like 256x256) without any significant increase in inference time over the 128x128 images. (Just to be clear, this is a fully convolutional architecture, so I can input images of different sizes.)
On Wednesday, August 17, 2016 at 3:11:45 PM UTC-5, smth chntla wrote:
Usually for GPU the batch size has to be larger than 1 image.
Try a batch size of 256 inputs, and you'll see a wider gap in performance.
On Wed, Aug 17, 2016 at 4:03 PM, Alexander Weiss via torch7 <torch7+APn2wQcsaR8bcOeppHpjYu30PfpJTwn3luLMBUUyH0EVaRyRp1q7UJ...@googlegroups.com> wrote:I've been a big fan of VGG-style networks for a while, but I recently decided to try out a SqueezeNet-style network for classification and see if I could reduce the size of my nets and get a nice speed boost. As promised, the networks are significantly smaller, while still maintaining very good accuracy. In my case, I was able to reduce my networks from ~130MB down to ~5MB. I'm not doing any additional compression on the networks (like pruning, quantization, etc.). All of this was implemented in Torch.--I expected that with such a large reduction in the number of neural net parameters (25x reduction), the nets would run significantly faster than my VGG-style nets. Disappointingly, I'm only getting a 33% speed boost. Running on an AWS K520 GPU, inference time for my old nets was ~7.5ms for a 128x128 image. For my new nets, inference time is ~5ms. Maybe this isn't surprising; the number of layers in my old and new networks are pretty similar and the GPU can't parallelize computation over different layers. (On the CPU, the SqueezeNet-style network runs 3x faster than the VGG-style network.) So maybe for increased speed (w/ GPU), I really need to go wider instead of deeper, but are there any tricks to improve the speed of my SqueezeNet-style network within the Torch framework (or otherwise)?
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+unsubscribe@googlegroups.com.
are you using cudnn, and are you using the option: "cudnn.benchmark = true" ?
On Wed, Aug 17, 2016 at 4:33 PM, Alexander Weiss via torch7 <torch7+APn2wQcsaR8bcOeppHpjYu30PfpJTwn3luLMBUUyH0EVaRyRp1q7UJHGY@googlegroups.com> wrote:
My GPU doesn't have enough memory to batch 256 images. However, the inference time of the SqueezeNet-style network appears to scale linearly with batch size up to 64 images. Interestingly, it seems like I can push through larger images (like 256x256) without any significant increase in inference time over the 128x128 images. (Just to be clear, this is a fully convolutional architecture, so I can input images of different sizes.)
On Wednesday, August 17, 2016 at 3:11:45 PM UTC-5, smth chntla wrote:
Usually for GPU the batch size has to be larger than 1 image.
Try a batch size of 256 inputs, and you'll see a wider gap in performance.
On Wed, Aug 17, 2016 at 4:03 PM, Alexander Weiss via torch7 <torch7+APn2wQcsaR8bcOeppHpjYu30PfpJTwn3luLMBUUyH0EVaRyRp1q7UJHGY@googlegroups.com> wrote:I've been a big fan of VGG-style networks for a while, but I recently decided to try out a SqueezeNet-style network for classification and see if I could reduce the size of my nets and get a nice speed boost. As promised, the networks are significantly smaller, while still maintaining very good accuracy. In my case, I was able to reduce my networks from ~130MB down to ~5MB. I'm not doing any additional compression on the networks (like pruning, quantization, etc.). All of this was implemented in Torch.--I expected that with such a large reduction in the number of neural net parameters (25x reduction), the nets would run significantly faster than my VGG-style nets. Disappointingly, I'm only getting a 33% speed boost. Running on an AWS K520 GPU, inference time for my old nets was ~7.5ms for a 128x128 image. For my new nets, inference time is ~5ms. Maybe this isn't surprising; the number of layers in my old and new networks are pretty similar and the GPU can't parallelize computation over different layers. (On the CPU, the SqueezeNet-style network runs 3x faster than the VGG-style network.) So maybe for increased speed (w/ GPU), I really need to go wider instead of deeper, but are there any tricks to improve the speed of my SqueezeNet-style network within the Torch framework (or otherwise)?
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.
No. I gave up on cudnn because of mysterious memory leaks (which apparently only effect me). I can try reinstalling it and then running some tests, but it's not an ideal solution for me. Can I really expect a significant boost from this? I honestly never noticed significant speed differences with cudnn, at least not during inference time. Maybe it's because I wasn't making large enough batches.
On Wednesday, August 17, 2016 at 3:51:35 PM UTC-5, smth chntla wrote:
are you using cudnn, and are you using the option: "cudnn.benchmark = true" ?
On Wed, Aug 17, 2016 at 4:33 PM, Alexander Weiss via torch7 <torch7+APn2wQcsaR8bcOeppHpjYu30PfpJTwn3luLMBUUyH0EVaRyRp1q7UJ...@googlegroups.com> wrote:My GPU doesn't have enough memory to batch 256 images. However, the inference time of the SqueezeNet-style network appears to scale linearly with batch size up to 64 images. Interestingly, it seems like I can push through larger images (like 256x256) without any significant increase in inference time over the 128x128 images. (Just to be clear, this is a fully convolutional architecture, so I can input images of different sizes.)
On Wednesday, August 17, 2016 at 3:11:45 PM UTC-5, smth chntla wrote:Usually for GPU the batch size has to be larger than 1 image.
Try a batch size of 256 inputs, and you'll see a wider gap in performance.On Wed, Aug 17, 2016 at 4:03 PM, Alexander Weiss via torch7 <torch7+APn2wQcsaR8bcOeppHpjYu30PfpJTwn3luLMBUUyH0EVaRyRp1q7UJ...@googlegroups.com> wrote:I've been a big fan of VGG-style networks for a while, but I recently decided to try out a SqueezeNet-style network for classification and see if I could reduce the size of my nets and get a nice speed boost. As promised, the networks are significantly smaller, while still maintaining very good accuracy. In my case, I was able to reduce my networks from ~130MB down to ~5MB. I'm not doing any additional compression on the networks (like pruning, quantization, etc.). All of this was implemented in Torch.--I expected that with such a large reduction in the number of neural net parameters (25x reduction), the nets would run significantly faster than my VGG-style nets. Disappointingly, I'm only getting a 33% speed boost. Running on an AWS K520 GPU, inference time for my old nets was ~7.5ms for a 128x128 image. For my new nets, inference time is ~5ms. Maybe this isn't surprising; the number of layers in my old and new networks are pretty similar and the GPU can't parallelize computation over different layers. (On the CPU, the SqueezeNet-style network runs 3x faster than the VGG-style network.) So maybe for increased speed (w/ GPU), I really need to go wider instead of deeper, but are there any tricks to improve the speed of my SqueezeNet-style network within the Torch framework (or otherwise)?
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+unsubscribe@googlegroups.com.