Yes, this is true. FaceNet is a spectacular, state of the art paper - a must read for anyone in deep learning.
However, it is an example of how industry and academia are pulling away from each other in deep learning. In other words, if you are a grad student, there is generally no way you will get access to the quantity/quality of train data they have not to mention the computing resources needed to reproduce these results (nevermind beating it). Intern at Google/Facebook would work though.
These embeddings are the way of the future and far superior to extracting representations from nets as discussed above IMO. "Gardening" for representations by training a net with some data for a given task and then picking out internal reps (then throwing them into 1-vs-all SVMs or whatever) is the standard way to go over the last couple of years.
Lacking sufficient train data (as you do), someone else then, has grown the garden (pre-trained the net) and you just pick out the representations that work the best. For example, on the dogs vs cats dataset (Kaggle), this simple approach reaches 97% or so which is still very effective. If you think about how the AlexNet feature garden was grown (classification task of 1000 classes), then of course you cannot expect it to do anywhere as good as FaceNet (learning embeddings). But it's a good exercise in any case for a CNN course or as a learning exercise.