Data layer prefetch queue empty hang

1,699 views
Skip to first unread message

dsb

unread,
Dec 20, 2015, 11:54:25 PM12/20/15
to Caffe Users
I am trying to run caffe on a cluster with 4 GPUs. If I run it I consistently see caffe hanging with this
error :

I1221 05:50:07.595844 17763 blocking_queue.cpp:50] Data layer prefetch queue empty

It runs fine when I run using upto 3 GPUs. On using  4 or more, I see this error randomly occuring
sometimes after 300 iterations with CIFAR10 data or sometimes after 9800 iterations but it 
hangs in about 9/10 cases. In some rare case, it passes...

Happens the same for mnist  dataset too. I am trying to run on a system with upto 8 GPUs.

Any particular fix for this issue ?

Thanks

Frank Liu

unread,
Dec 22, 2015, 12:40:17 AM12/22/15
to Caffe Users
Since you're running it on a cluster, I'm assuming that this occurs because your data I/O rate varies. You could probably remedy this by bumping the "prefetch" parameter as so:

data_param {
  source: "./data/ilsvrc12/ilsvrc12_train_lmdb"
  batch_size: 32
  backend: LMDB
  prefetch: 20
}

The default prefetch value is 4, if I remember correctly.

dsb

unread,
Jan 3, 2016, 8:39:42 PM1/3/16
to Caffe Users
Thanks Frank. That helped.

siebe...@googlemail.com

unread,
May 5, 2016, 7:58:24 PM5/5/16
to Caffe Users
I have no idea, how this could have helped. The prefetch data_param is not used internally at all, see: https://github.com/BVLC/caffe/issues/4100

Nehal Doiphode

unread,
Jun 17, 2016, 6:01:54 AM6/17/16
to Caffe Users
tried it,doesnt work.

Hou Yunqing

unread,
Jul 13, 2016, 11:55:15 PM7/13/16
to Caffe Users
Is there a recommended disk transfer rate for single/multi-gpu training with caffe? I'm getting the same error and apparently my HDD's read rate is saturated. I'm thinking whether I should get a big SSD or get 2 HDD in RAID 0. If helps if someone can point out what read rates caffe can achieve when training on imagenet with various configurations.

Thanks
Reply all
Reply to author
Forward
0 new messages