Hi. By normalization of the data, do you mean a linear baseline subtraction of the time series data as part of preprocessing, or are you referring to normalization of time-frequency power values? Power values can only be positive, so there is no concern about negative values during normalization. In other words, you would first compute the PSD, and then apply the baseline normalization on the time-frequency power values.
As for your second question, multitapering increases spectral leakage relative to short-time FFT or wavelet convolution. If you want to maximize spectral precision, FFT would be the best. For the short-time FFT, the precise choice of window doesn't really matter (Kaiser, Hann, Hamming, Blackmann, etc.). If you want to apply more than one taper (e.g., multitaper), then it's important to have the tapers be mutually orthogonal, which is why the Slepian tapers are used.
Hope that helps