How to handle multiple input images in caffe

3,566 views
Skip to first unread message

vicky

unread,
Jun 8, 2015, 5:06:53 AM6/8/15
to caffe...@googlegroups.com
Hi all,
I  want to design a cnn network with two input images, say an RGB image and a depth image.
But I am confused with this multiple inputs.
Do I need to convert thest two images into a lvdb format data? Or do I convert two lvdb images and use CONCAT layers to combine them? 
Thanks

Shi Yemin

unread,
Jun 8, 2015, 7:05:39 AM6/8/15
to caffe...@googlegroups.com
merge these two images when converting them to lmdb could help.

在 2015年6月8日星期一 UTC+8下午5:06:53,vicky写道:

vicky

unread,
Jun 8, 2015, 10:59:44 AM6/8/15
to caffe...@googlegroups.com
You mean that merging the two images when I creat Lmdb?
I have read the inner codes in the convert_imagenet, it seems that if I want to merge these two images into Lmdb I should change 
the relevant code  in the "ReadImageToDatum". 

But my concerns is that once I have created a 4-D Lmdb data, when I use it in the network do I need do some additional changes?

I have also read the relevant inner code "data_layer.cpp" ( BTW, it is hard for me to follow.)  I notice that there is a 
 funciton "void DataLayer<Dtype>::InternalThreadEntry() " in  "data_layer.cpp" and  it called function “DecodeDatumToCVMat”.
The problem is that  this “DecodeDatumToCVMat” function read 
an image from a buffer in memory. So if my Lmdb is a 4-D this function cannot work.

I donot know whether I have made some wrongs in understanding this part. 



在 2015年6月8日星期一 UTC+8下午7:05:39,Shi Yemin写道:

vicky

unread,
Jun 9, 2015, 1:03:13 AM6/9/15
to caffe...@googlegroups.com
up


在 2015年6月8日星期一 UTC+8下午5:06:53,vicky写道:
Hi all,

Floris Gaisser

unread,
Jun 9, 2015, 9:24:42 PM6/9/15
to caffe...@googlegroups.com
I haven't delved into the code, but I know OpenCV can handle 4D data in it's cv::Mat containers.
So I see no problem in using the RGBD data in your network.

Manuel Lopez Antequera

unread,
Jun 12, 2015, 4:45:31 AM6/12/15
to caffe...@googlegroups.com
if you have your images as RGB + D on two diferent files, you may simply use two image data layers as well. It's probably slower but you can avoid preprocessing the images.

Oliver Coleman

unread,
Jun 13, 2015, 10:05:25 PM6/13/15
to caffe...@googlegroups.com
Here's some code I wrote to do exactly this for my own RGB+D images. It generates training and test examples from a single pair of RGB and depth images using a target image to generate labels/categories for a window on the RGB=D image, but should be easy to adapt to use multiple pairs of RGB+D images instead.  

import sys
import math
import random
import numpy as np
import caffe
import lmdb
from PIL import Image
import shutil

def generate_db(image_path_base, max_examples_per_category, window_size, output_path):
    max_examples_per_category = int(max_examples_per_category)
    window_size = int(window_size)
    halfws = window_size / 2
    
    # Load RGB image.
    image = np.array(Image.open(image_path_base + "_rgb.png")).transpose((1,0,2))
    # Load depth image.
    image_depth = np.array(Image.open(image_path_base + "_d.png")).transpose((1,0,2))
    # Combine RGB and depth images, only keeping blue and alpha channels from depth image.
    # Final array has form [x][y][R,G,B,Argb,D,Ad] 
    # where Argb is the alpha channel from the RGB image and Ad is the alpha channel from the depth image.
    image = np.append(image, image_depth[:,:,2:], 2)
    
    image_target = np.array(Image.open(image_path_base + "_t.png")).transpose((1,0,2))
    
    potential_sample_pixels = [(x, y) for x in range(halfws, image.shape[0]-halfws) for y in range(halfws, image.shape[1]-halfws)]
    
    # Determine categories and collect each pixel coordinate for each category.
    category_pixels = {}
    for pix_xy in potential_sample_pixels:
        target = tuple(image_target[pix_xy[0]][pix_xy[1]])
        category_pixels.setdefault(target, list()).append(pix_xy)
    
    # Make sure an equal number of examples are created for each category.
    for (category_pixels_xy) in category_pixels.viewvalues():
        max_examples_per_category = min(max_examples_per_category, len(category_pixels_xy))
    #    print colour, category_pixels_xy
    
    print "Found", len(category_pixels), "categories in target image."
    
    # Delete old DBs if they exist.
    shutil.rmtree(output_path + "_train", ignore_errors=True)
    shutil.rmtree(output_path + "_test", ignore_errors=True)
    
    in_idx = 0
    in_idx_shuffle = range(len(category_pixels) * max_examples_per_category)
    random.shuffle(in_idx_shuffle)
    
    total_test = 0
    total_train = 0
    
    train_db = lmdb.open(output_path + "_train", map_size=int(1e12))
    test_db = lmdb.open(output_path + "_test", map_size=int(1e12))
    with train_db.begin(write=True) as train_txn:
        with test_db.begin(write=True) as test_txn:
            category_index = 0;
            for (target_colour, category_pixels_xy) in category_pixels.viewitems():
                print "Label", category_index, "corresponds to category colour", target_colour
                print "  Found", len(category_pixels_xy), "pixels in target image belonging to this category."
                
                #num_examples_for_this_cat = min(max_examples_per_category, len(category_pixels_xy))
                num_examples_for_this_cat = max_examples_per_category
                # At least one test example per category.
                num_test_for_this_cat = int(math.ceil(num_examples_for_this_cat * 0.2))
                
                print "  Adding", (num_examples_for_this_cat - num_test_for_this_cat), "training examples and", num_test_for_this_cat, "test examples."
                
                category_sample_count = 0
                
                for xy in random.sample(category_pixels_xy, len(category_pixels_xy)):
                    window = image[xy[0]-halfws:xy[0]+halfws+1, xy[1]-halfws:xy[1]+halfws+1]
                    
                    # Check if any pixels are transparent (missing data)
                    for wp in np.nditer(window, flags=['external_loop'], order='C'):
                        if wp[3] != 255 or wp[5] != 255:
                            break;
                    else:
                        # All pixels in the window have data, add sample to DB.
                        # Remove alpha channels.
                        window = np.delete(window, [3, 5], 2)
                        
                        # Normalise so total sum is 0
                        window = window - window.mean()
                        
                        # transpose to channels, height, width for caffe.io.array_to_datum
                        window = window.transpose((2,1,0))
                        
                        datum = caffe.io.array_to_datum(window, category_index)
                        
                        if category_sample_count < num_test_for_this_cat:
                            test_txn.put('{:0>10d}'.format(in_idx_shuffle[in_idx]), datum.SerializeToString())
                            total_test += 1
                        else:
                            train_txn.put('{:0>10d}'.format(in_idx_shuffle[in_idx]), datum.SerializeToString())
                            total_train += 1
                        
                        category_sample_count += 1
                        
                        in_idx += 1
                        
                    if category_sample_count == num_examples_for_this_cat:
                        break;
                    
                category_index += 1
            
    train_db.close()
    test_db.close()
    
    print "Added", total_train, "total training examples and", total_test, "total test examples."
    
    # Inspect a specified datum from the DB.
#     env = lmdb.open(output_path, readonly=True)
#     with env.begin() as txn:
#         raw_datum = txn.get('{:0>10d}'.format(10))
#      
#     datum = caffe_pb2.Datum()
#     datum.ParseFromString(raw_datum)
#      
#     flat_x = np.fromstring(datum.data, dtype=np.uint8)
#     x = flat_x.reshape(datum.channels, datum.height, datum.width)
#     y = datum.label
#     print y, x

陈凯

unread,
Jul 8, 2015, 9:34:31 PM7/8/15
to caffe...@googlegroups.com
Hi, Manuel

I read your answer and I am interested in your solution, can you tell me more about it ?

That is, for image X, I have two sub-image to represent it which denoted as x1 and x2. I prefer to use two data layer as net input, then x1 and x2 travel two network line respective and combine to one feature map by a concat layer.

Besides, in test phrase, I want to use the two image x1 and x2 as the input and I want to get the correct classification result by the network.

Now the question is, how could I implement it ?

Thank you very much!

Lionel


在 2015年6月12日星期五 UTC+8下午4:45:31,Manuel Lopez Antequera写道:

Kemal ÇİZMECİLER

unread,
Mar 22, 2016, 9:48:53 AM3/22/16
to Caffe Users
did you find a way ?

9 Temmuz 2015 Perşembe 04:34:31 UTC+3 tarihinde Lionel yazdı:
Reply all
Reply to author
Forward
0 new messages