In general you have 2 face recognition schemes:
1. same \ not same
2. person identification.
For same \ not same:
Each instance is comprised of 2 facial images and your model should return whether it's the same person in both images or not.
Advantages:
From a relatively small dataset (e.g. lfw) you can create huge datasets for training by pairing images on your own
given 2 images, you can get a same \ not same classification for persons outside of your training dataset
Disadvantages:
In order to get an actual identification you need to compare to every person in your reference database (you might get few "same" results)
For person identification:
Each instance is an image and you want to classify to the correct identity.
The number of classes is the same as the number of persons in your data.
You can use a "regular" CNN for this type of classification.
Advantages:
(Assuming softmax output) you get the probability that a new image is the same to someone in your reference database, so you can check how close it is to "second place"
Network structure is simpler (although the siamese architecture is not really that complicated).
Disadvantages:
A lot harder to train a classifier for K classes than 2
Your model is limited to finding an identity within your reference database (you can't classify a new person)
All of the above doesn't describe any pre-processing you might want to have on facial images.
It's debatable how much this pre-processing is required with deep learning models, but they are practically a "must" for non deep learning (and I believe they improve deep learning as well).
Some of the pre-processing might include:
Facial landmark detection
Face alignment (2D) or Frontalization (3D)
Image cropping
and more...
A good place to start in my opinion is LFW (Labeled Faces in the Wild) which gives a relatively large dataset (~13.5K good quality images)