Yeah you should definitely have a look at the corresponding scientific literature, maybe starting with
http://research.microsoft.com/pubs/192769/tricks-2012.pdf.
In short: For most of these settings there are some heuristics that worked well in the past, but we actually don't have any true insight on most of them, e. g. what the values should be in a specific case. For weight decay one usually chooses something in the range [1e-6, 1e-4] afaik. The lr_base, policy and associated params (gamma, power, step) are also crucial to training success, but I (and probably most other people) am not sure what is the best choice here. Maybe you try with the recommendation from the referenced paper? It also depends somewhat on the solver type: From past experience I would recommend AdaDelta and a fixed 1.0 learning rate (the adaption of the gradient updates is completely managed by adadelta, you don't have to care about them anymore).
Jan