A 'normal' NN with some fully connected layers at the top, generally gives labels (of the number in the image) as an output.
A siamese network is different in architecture and in a sense has a different goal.
For
example 'Dimensionality Reduction by Learning an Invariant Mapping'
tries to do dimension reduction similar to the result of a non-linear
PCA. (Do you remember EigenFaces?)
I don't know what your goals
is? With prediction I assume you want to get a prediction
(classification) of what number is written in the image?
This still
can be used for classical classification, by using the weights in a
'normal' NN and removing the contrastive loss layer and either:
- adding a fully connected layer on top and then do fine tuning (this is what I'm now trying to figure out)
- use a SVM, KNN or any other classifier with the output of the last layer as a feature vector.
Though I haven't compared the classification accuracy of such a construction as my interests lay elsewhere.
But if you want to use it in a different way like finding similar images this is the way to go.
P.S. I've created a
PR that adds some code to make your own siamese datasets easily from images, it might be of use.