I'm unsure how close to the hardware you're working, but it looks like the UART baud rate generator is using 13x oversampling when you were expecting 16x oversampling. This is controlled by MODESELECT field in the UART's MDR1 register.
So long as you're using the standard internal clock rates, the UARTs take a 48MHz clock as input, divide that down by the divisor you provide (UART registers DLH and DLL), then use the resulting clock to oversample each pixel period by either 13x or 16x. Since you are trying to get 1Mbaud, a divisor of 3 with 16x oversampling would get you there. It appears as if you have a divisor of 3 with 13x oversampling producing approximately 1.23Mbaud.
One small change and you may be good to go.