I'll post my working solution but I'm almost sure you'll have to edit a lot of it to your needs.
I hope it'll give you some directions.. That's the best i can do.
I use caffenet and it comes with deploy.prototxt that I need to use for classification.
I needed to changed the name of the output layer (fc8 in caffenet) like I needed to change it in the train_val.prototxt (also comes with caffe).
I also changed the 'num_output' parameter is the output layer from 1000 to 2.
my images are 256x256 because that's the size required by caffenet for training. On the other hand, the size required for classification is 227x227 (images are cropped during training).
I could have 2 folders with 2 different images sizes but I didn't want that So I did a work-around,
I'm 'resizing' my images when I insert them to the lmdb with the --resize_height, --resize_width arguments.
This means that I don't resize the actual files - only the images inside the lmdb.
Then I create the image mean from the train_lmdb, And then, during the classification,
after loading an image with caffe.io.load_image, I center-crop the numpy array (which is the loaded image).
Note that you need to do the same transformations you did on your training images.
For this I use the caffe.io.Transformer. This might be model dependent.
For example this:
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255] - needed for caffenet\Alexnet.
You need to check what preprocessing you need to do.
Here are the relevant functions.
Call in this order:
first create train.txt and val.txt,
then call make_lmdb,
then
make_image_mean_binaryproto
and then binary_classification_from_raw_images.
My code is a little ruff in some places - I had to get a prototype going..
Note that I have some global variables:
caffe_root = $CAFFE_ROOTcaffe_tools - path to $CAFFE_ROOT/build/tools
my_model_data - path to my images folder
my_model_mean_binaryproto - path to the image mean binaryproto which is created in my images folder,
my_model - just the name of my project
CLASSIFICATION_IMAGE_SIZE=227 # could be different for VGG
def binary_classification_from_raw_images(model, weights):
"""
:param model: path to deploy.prototxt. comes with caffenet.
You need to change the name of the output layer to the same name you gave your output layer in the train_val.prototxt
and also change the num_output to 2.
:param weights: path to the binaryproto. snapshot of the fine-tuned model. define how to save it in your solver.prototxt.
:return:
"""
#### set your paths
# caffe_root = caffe root.
# my_model_mean_binaryproto = my trained images mean binaryproto.
# my_model_data = path to my images folder. all my images are in the same folder and i differentiate them by keywords in the filename.
# In my images folder I also have val.txt and train.txt with filename and their label for training and validation.
# From these files I also create the lmdbs.
# * Load `caffe`.
# The caffe module needs to be on the Python path;
# we'll add it here explicitly.
sys.path.insert(0, caffe_root + '/python')
import caffe
# * Set Caffe to CPU mode and load the net from disk.
caffe.set_mode_cpu()
net = caffe.Net(model, # defines the structure of the model
weights,
caffe.TEST) # use test mode (e.g., don't perform dropout)
# * Set up input preprocessing. (We'll use Caffe's `caffe.io.Transformer` to do this, but this step is independent of other parts of Caffe, so any custom preprocessing code may be used).
#
# Our default CaffeNet is configured to take images in BGR format. Values are expected to start in the range [0, 255] and then have the mean ImageNet pixel value subtracted from them. In addition, the channel dimension is expected as the first (_outermost_) dimension.
#
# As matplotlib will load images with values in the range [0, 1] in RGB format with the channel as the _innermost_ dimension, we are arranging for the needed transformations here.
#### I don't have .npy array with the mean.
# load the mean ImageNet image (as distributed with Caffe) for subtraction
# mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')
# mu = mu.mean(1).mean(1) # average over pixels to obtain the mean (BGR) pixel values
# print 'mean-subtracted values:', zip('BGR', mu)
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
# I don't have my image mean in .npy file but in binaryproto. I'm converting it to a numpy array.
# Took me some time to figure this out.
blob = caffe.proto.caffe_pb2.BlobProto()
data = open(my_model_mean_binaryproto, 'rb').read()
blob.ParseFromString(data)
mu = np.array(caffe.io.blobproto_to_array(blob))
mu = mu.squeeze() # The output array had one redundant dimension.
transformer.set_transpose('data', (2, 0, 1)) # move image channels to outermost dimension
transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255] - needed for caffenet\Alexnet.
transformer.set_channel_swap('data', (2, 1, 0)) # swap channels from RGB to BGR
# ### 3. CPU classification
#
# * Now we're ready to perform classification. Even though we'll only classify one image, we'll set a batch size of 50 to demonstrate batching.
# In[6]:
# set the size of the input (we can skip this if we're happy
# with the default; we can also change it later, e.g., for different batch sizes)
net.blobs['data'].reshape(32, # batch size
3, # 3-channel (BGR) images
CLASSIFICATION_IMAGE_SIZE, CLASSIFICATION_IMAGE_SIZE) # image size is 227x227
correct = 0
count = 1
# save labels and prediction in arrays for further statistical analysis.
y_test, y_pred = [], []
# in my case, I'm reading the images file names from val.txt.
val_images = open(my_model_data + "/val.txt").readlines()
# You don't need to shuffle.. This is how I want to see the output.
random.shuffle(val_images)
# I'm saving mis-classified filenames in a file.
misclassified = open(my_model_data + "/misclassified.txt", "w")
for image_name_n_label in val_images:
# in val.txt every line contains - filename label
image_file, label = my_model_data + "/" + image_name_n_label.split(' ')[0], int(image_name_n_label.split(' ')[1])
y_test.append(label)
print(count, ". image: " + os.path.basename(image_file) + " label: " + str(label))
image = caffe.io.load_image(image_file)
# image shape is (3, 256, 256). we want it (3, 227, 227) for caffenet.
if image.shape[0] != CLASSIFICATION_IMAGE_SIZE:
# I'm cropping the numpy array on the fly so that I don't have to mess with resizing
# the actual images in a separate folder each time.
image = center_crop_image(image, CLASSIFICATION_IMAGE_SIZE, CLASSIFICATION_IMAGE_SIZE)
if image.shape[0] != CLASSIFICATION_IMAGE_SIZE:
print("!!!!!!! cropped shape ", image.shape)
transformed_image = transformer.preprocess('data', image)
# copy the image data into the memory allocated for the net
net.blobs['data'].data[...] = transformed_image
### perform classification
output = net.forward()
output_prob = output['prob'][0] # the output probability vector for the first image in the batch
print(count, ". prob for 0 class: {0:.5f}, prob for 1 class: {1:.5f}".format(output_prob[0], output_prob[1]))
predicted_label = output_prob.argmax()
y_pred.append(predicted_label)
if predicted_label == label:
correct += 1
else:
print("!!!!!!!!!!!!!!! misclassified")
misclassified.write(os.path.basename(image_file) + " " + str(label) + " " + str(predicted_label) + "\n")
# display misclassified images.
image = PIL.Image.open(image_file)
image.show()
accuracy = ((100. * correct) / (count))
print(count, '. predicted class is: ', output_prob.argmax())
print(count, ". accuracy: " + str(accuracy))
print("")
count += 1
misclassified.close()
# -------------------------------------------------------------------------------------------------------
def center_crop_image(image, new_width, new_height):
height, width, chan = image.shape
width_cut = (width - new_width) // 2
height_cut = (height - new_height) // 2
top, bottom = height_cut, -height_cut
left, right = width_cut, -width_cut
# could have 1 pixel off.
height_diff = new_height - (height - (height_cut*2))
width_diff = new_width - (width - (width_cut*2))
top -= height_diff
left -= width_diff
# or
# bottom += ydiff
# right += xdiff
# or any other combination
return image[top:bottom, left:right]
# -------------------------------------------------------------------------------------------------------
def make_lmdb(resize):
curr_dir = os.getcwd()
os.chdir(my_model_data)
subprocess.call(r"rm -r -f train_lmdb", shell=True)
subprocess.call(r"rm -r -f val_lmdb", shell=True)
subprocess.call(r"GLOG_logtostderr=1 " + caffe_tools + "/convert_imageset --resize_height={size} --resize_width={size} --shuffle ./ train.txt train_lmdb".format(size=resize), shell=True)
subprocess.call(r"GLOG_logtostderr=1 " + caffe_tools + "/convert_imageset --resize_height={size} --resize_width={size} --shuffle ./ val.txt val_lmdb".format(size=resize), shell=True)
os.chdir(curr_dir)
# ---------------------------------------------------------------------------------------
def make_image_mean_binaryproto():
subprocess.call(r"rm -r -f " + my_model_data + "/" + my_model + "_mean.binaryproto", shell=True)
cmd = caffe_tools + "/compute_image_mean -backend=lmdb " + my_model_data + "/train_lmdb " + my_model_data + "/" + my_model + "_mean.binaryproto"
subprocess.call(cmd, shell=True)
# ---------------------------------------------------------------------------------------