So it was somewhat better when you decreased LR more often, yes?
How about starting from a lower base LR? Say, 1e-4 or 1e-5?
What do you mean by fine tune flicker training? I do fine-tuning all the time but solver params depend on a lot of things, it's not a "one rule to solve them all" kind of thing :P