work by adding and multiplying a constant (or a vector or blob of constants) to their one input. This constant is a learnable parameter of the network.
Eltwise adds or multiplies two input blobs together - so an Eltwise layer with a PROD operation does
almost the same thing Scale does, except the blobs it multiplies together are produced by other layers of the network, while in Scale one of the blobs is a learnable parameter.