I could not figure out from reading the implementation of the class StackedDenoisingAutoencoder whether the training is implemented one layer at a time, or by using gradient and jacobian of the entire pipline, and thus training all the layers simultaneously.
By the way, where can I find the autoencoder training code ? (It is not found in the autoencoder.py module).
Thanks !