With the current data pipeline you have to define two data layers, one for the actual input image and one for the ground truth "image" that defines the output labels. This means you have to generate two input DBs with the input and corresponding truth in the same order. Exactly how to do this is on our list for documentation and examples, but hasn't quite materialized yet. However, once the fully convolutional model is defined and the data DBs are defined training is a breeze since many Caffe losses are happy to take vector / matrix predictions and ground truths.
To help you along, check out this code sample for generating an LMDB in Python with custom data:
import caffe
import lmdb
in_db = lmdb.open('image-lmdb', map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(inputs):
im = caffe.io.load_image(in_)
im_dat = caffe.io.array_to_datum(im.transpose((2, 0, 1)))
in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())
in_db.close()
While this code makes an image DB, you can likewise make the ground truth DB by forming the array of window labels and calling `caffe.io.array_to_datum`. Note that the indices are zero padded to preserve their order: LMDB sorts the keys lexicographically so bare integers as strings will be disordered.
If you are able to get this working on your own, contributing back a LeNet detector example done in the fully convolutional way would be a great help. Good luck!