I took a hard look at your file. It's a little disconcerting, given that it I think it comes from FFmpeg?
My hunch is that the reason Firefox plays it back in sync is that it ignores timestamps while maybe Chrome doesn't.
I wrote a little tool to examine the file, checking which video frames and audio samples are in the file and comparing them to the frame/sample you might expect at that timestamp. The program and the output are attached.
I think the main problem is that the number of audio samples at any point in time are fewer than what I'd expect given the timestamp and the sample rate. The gap increases over time, so that by the end of the video there are about 43000 fewer samples, nearly one second off given the 48 kHz sample rate.
Other issues I see:
1. There is no frame rate specified and the frames have uneven timecode separations, confusing a program that tries to guess the framerate. My program came up with 13.672 frames per second, although the overall rate is more like 27.5 fps.
2. The first video frame has timecode 30000000, not 0.