Training a Classifier Problem: BLAS : Program is Terminated. Because you tried to allocate too many

931 views
Skip to first unread message

Somebody Else

unread,
Mar 21, 2016, 7:35:47 PM3/21/16
to CMU-OpenFace
I was able in the past with less CPU and RAM and with less images to create my own classifier. I am now on the second round with some bigger test (contains 143k samples) but I run in two different problems:

1) I use a 8 core CPU with 32GB RAM and when I run: 

./batch-represent/main.lua -outDir /data-final/ -data /data-aligned/

I get the following output:

{
  data
: "/data-aligned/"
  imgDim
: 96
  cache
: false
  model
: "/root/src/openface/models/openface/nn4.small2.v1.t7"
  outDir
: "/data-final/"
  cuda
: false
  batchSize
: 50
}
/data-aligned/
cache lotation
:         /data-aligned/cache.t7
Creating metadata for cache.
{
  sampleSize
:
   
{
     
1 : 3
     
2 : 96
     
3 : 96
   
}
  split
: 0
  verbose
: true
  paths
:
   
{
     
1 : "/data-aligned/"
   
}
  samplingMode
: "balanced"
  loadSize
:
   
{
     
1 : 3
     
2 : 96
     
3 : 96
   
}
}
running
"find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
now combine all the files to a single large file
load the large concatenated list of sample paths to
self.imagePath
143789 samples found.======================================== 140000/143789 ========>...] ETA: 1ms | Step: 0ms
Updating classList and imageClass appropriately
 
[================================================================ 9324/9324 ==========>] Tot: 41s734ms | Step: 4ms
Cleaning up temporary files
Splitting training and test sets to a ratio of 0/100
nImgs
:  143789
BLAS
: Program is Terminated. Because you tried to allocate too many memory regions.
BLAS
: Program is Terminated. Because you tried to allocate too many memory regions.
BLAS
: Program is Terminated. Because you tried to allocate too many memory regions.
BLAS
: Program is Terminated. Because you tried to allocate too many memory regions.
BLAS
: Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault (core dumped)

I shortly googled the error message but it only confused me more. Any idea what I am doing wrong? I tried smaller and bigger batch sizes as well. I also pruned the directories beforehand.

2) After above problem I tried to run the same step with a GPU computer (8 Core with 15GB RAM and I believe 2xGPU) but I got below error when running: 

./batch-represent/main.lua -outDir /data-final/ -data /data-aligned/ -cuda

Output:
{
  data
: "/data-aligned/"
  imgDim
: 96
  cache
: false
  model
: "/root/src/openface/models/openface/nn4.small2.v1.t7"
  outDir
: "/data-final/"
  cuda
: true
  batchSize
: 50
}
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init.lua:383: module 'cutorch' not found:No LuaRocks module found for cutorch
       
no field package.preload['cutorch']
       
no file '/root/.luarocks/share/lua/5.1/cutorch.lua'
       
no file '/root/.luarocks/share/lua/5.1/cutorch/init.lua'
       
no file '/root/torch/install/share/lua/5.1/cutorch.lua'
       
no file '/root/torch/install/share/lua/5.1/cutorch/init.lua'
       
no file './cutorch.lua'
       
no file '/root/torch/install/share/luajit-2.1.0-beta1/cutorch.lua'
       
no file '/usr/local/share/lua/5.1/cutorch.lua'
       
no file '/usr/local/share/lua/5.1/cutorch/init.lua'
       
no file '/root/.luarocks/lib/lua/5.1/cutorch.so'
       
no file '/root/torch/install/lib/lua/5.1/cutorch.so'
       
no file '/root/torch/install/lib/cutorch.so'
       
no file './cutorch.so'
       
no file '/usr/local/lib/lua/5.1/cutorch.so'
       
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback
:
       
[C]: in function 'error'
       
/root/torch/install/share/lua/5.1/trepl/init.lua:383: in function 'require'
       
./batch-represent/main.lua:22: in main chunk
       
[C]: in function 'dofile'
       
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
       
[C]: at 0x00406670


Feedback welcome on the GPU issue as well since my next round will be with over 1 million faces from 60,000 different people and there I am quite sure a GPU will help.

3) On a side note I noticed that cropping and aligning those 143k faces took many days (as far as I remember I stopped the process after 3-4 days) since I wanted to move forward. I used quite a slow computer (2 CPU with 4 GB RAM only). Do you think this task (cropping/align etc). will go faster with above mentioned systems (8 CPU/32GB RAM and the GPU computer)? If yes, which system should I choose? More RAM or a GPU? How much time should I count for 1 million images to be cropped?

Thank you for the feedback and help. I really really like Openface very much. 

Domi

Somebody Else

unread,
Mar 22, 2016, 2:42:47 AM3/22/16
to CMU-OpenFace
ok it looks like error 2) is from not having cutorch and cunn (?) installed on the machine. it it correct that in the docker version those dependencies for GPU processing are missing? if yes, can you list all the needed dependencies and how to setup them?

Thank you!


Brandon Amos

unread,
Mar 22, 2016, 4:28:09 PM3/22/16
to Somebody Else, CMU-OpenFace
> BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
> Segmentation fault (core dumped)

I haven't seen this before.
I'm surprised using a smaller batch size doesn't resolve this.
What BLAS library are you using?
If you aren't using OpenBLAS, I recommend installing and trying it.

> /root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init
> .lua:383: module 'cutorch' not found:No LuaRocks module found for cutorch
>
Sure, I've added cutorch to the required package list in our setup guide:
http://cmusatyalab.github.io/openface/setup/

cutorch and cunn are the only prerequisites for GPU execution,
though I've never tried it from inside a Docker container.
Do you have OpenFace running with CUDA in a Docker container?

> Do you think this task (cropping/align etc). will go faster with
> above mentioned systems (8 CPU/32GB RAM and the GPU computer)? If
> yes, which system should I choose?

I don't have any specific hardware recommendations or time
estimations, but a GPU does *not* help our current detection and
alignment code since they're CPU-only.
If you're interested in adding GPU-based face detection I'd
be happy for a PR, as it should greatly improve the
batch-alignment and inference times.
Unfortunately I'm not aware of good GPU-based face detection software.

-Brandon.
signature.asc

Somebody Else

unread,
Mar 22, 2016, 5:27:33 PM3/22/16
to CMU-OpenFace
Thank you Brandon.

-> What BLAS library are you using?
I use the docker version of Openface. I will check tomorrow and report back.

Domi

Somebody Else

unread,
Mar 23, 2016, 4:19:00 PM3/23/16
to CMU-OpenFace
I figured out that the above mentioned error 

BLAS : Program is Terminated. Because you tried to allocate too many memory regions.

with the pre-built docker version of Openface occurs with machines having more than 2 CPU's. I have no problems generating the csv files with any machine having just 2 CPU's. Once the machine has 4 or more CPU's the BLAS error is occuring. I am using the pre-built docker version.

I also tried to install cutorch and cunn from within the pre-built docker version but I have no luck with it. I tried the commands:

luarocks install cunn
luarocks install cutorch

but I get error about not being able to clone into the git (although I can clone a git from the docker container command line). 

Questions:
1) Would you suggest me building Openface at this stage now from the source and no longer use the docker version? 
2) Do you think the smaller GPU instance from Amazon AWS (g2.2xlarge -> https://aws.amazon.com/ec2/instance-types/) will work with Openface for GPU processing when setting up everything outside from the docker container and build it from source?





Brandon Amos

unread,
Mar 23, 2016, 4:38:36 PM3/23/16
to Somebody Else, CMU-OpenFace
Hi Domi,

> with the pre-built docker version of Openface occurs with machines having
> more than 2 CPU's. I have no problems generating the csv files with any
> machine having just 2 CPU's. Once the machine has 4 or more CPU's the BLAS
> error is occuring. I am using the pre-built docker version.

Interesting, on my 8 core machines the docker container works well.

> but I get error about not being able to clone into the git (although I can
> clone a git from the docker container command line).

I haven't seen these errors, but somebody in the Torch community might
be able to help.

> Questions:
> 1) Would you suggest me building Openface at this stage now from the source
> and no longer use the docker version?

Yes, if you want to use CUDA I've never used it in the Docker
container and I always build manually.

> 2) Do you think the smaller GPU instance from Amazon AWS
> (g2.2xlarge -> https://aws.amazon.com/ec2/instance-types/)
> will work with Openface for GPU processing when setting up everything
> outside from the docker container and build it from source?

Yes, I think it should work, but I've never tried it myself.

-Brandon.
signature.asc

Somebody Else

unread,
Mar 23, 2016, 6:19:12 PM3/23/16
to CMU-OpenFace
Thank you Brandon. I will continue tomorrow, latest the weekend with Openface.

I have currently running now the train part of the classifier with those 143k sample faces. In the manual its written that it takes for 1000's of images just seconds. However it runs already for several hours. I can see the Python process is running but there is no other output than the inital 

Loading embeddings.

text. I keep it running now until tomorrow morning/evening but I am a bit confused that this takes so long when its written that it should take just seconds.

Domi

Somebody Else

unread,
Mar 23, 2016, 6:20:44 PM3/23/16
to CMU-OpenFace

-rw-r--r-- 1 root root 8.5M Mar 23 21:54 labels.csv
-rw-r--r-- 1 root root 321M Mar 23 21:54 reps.csv


are the file sizes if it helps for giving advise.

Somebody Else

unread,
Mar 24, 2016, 6:17:54 AM3/24/16
to CMU-OpenFace
I keep it running now until tomorrow morning/evening but I am a bit confused that this takes so long when its written that it should take just seconds.

-> server was unresponsive in the morning. Trying now the train part with 8 core and 16GB RAM

Somebody Else

unread,
Mar 25, 2016, 4:35:12 AM3/25/16
to CMU-OpenFace
Hi Brandon,

I could generate a new classifier now. I also found the solution to the BLAS error - I had to run the following commands on my 8 Core machine:

export OPENBLAS_NUM_THREADS=1
export GOTO_NUM_THREADS=1
export OMP_NUM_THREADS=1

Brandon Amos

unread,
Mar 25, 2016, 10:22:03 AM3/25/16
to Somebody Else, CMU-OpenFace
Interesting! Thanks for sharing. I'd expect this in general
to give a performance decrease, but it seems like too many
threads are being launched from somewhere.

-Brandon.
signature.asc

Big~Newbie

unread,
Mar 20, 2017, 10:23:47 AM3/20/17
to CMU-OpenFace
Awesome! I met the same issue and struggled 1 day to figure it out and almost give up, your tips solved my problem when i run multi-threads in torch7. But , i donnt know the reason why you set threads number as 1. It shouldn't be greater than 1 as we use multi-threads? 

在 2016年3月25日星期五 UTC+8下午4:35:12,Somebody Else写道:
Reply all
Reply to author
Forward
0 new messages