It's also possible that it's the test code if it's only the riscv tests and benchmarks that timeout because iirc they don't have the magic load instructions needed to cause pass/fail (and need modified to include them or built with linking to our syscall.c or something). The test_end_checker at the chipset crossbar looks for the load to the pass/fail address and then sets the corresponding wire high which causes the uart to send the code to the host. If the load isn't part of the program then it won't send the code.
But if hello world etc also fail then it definitely could also just be that the pyserial package is now spitting out a slightly wrong binary string because the semantics changed for python3. Then our code could be checking for the wrong thing. Worth a look.
Thanks,
Jon