The problem is the following: You need to give the label as the _second_ bottom blob, not the first. The loss and accuracy layers need to make an assumption which blob contains what, because they are not symmetric/interchangeable: Usually the data blob (first bottom blob) contains something like a discrete posterior PDF for every sample, where the label blob (second bottom blob) just contains a single integer number (zero-based class index) for every sample. Switch these two around in your loss layer and everything should work.
Regarding axis_index: It is the zero-based index of the axis in the first bottom blob along which the softmax is computed, or along which the class prediction is computed by argmax, respectively. Usually this is 1, since the first axis (index 0) denotes the sample index in the batch. That being said, it can be set to a different value (using a configuration variable in the layer in the net's prototxt), which can make sense in certain settings. But not in your setting.
Jan