Hi Yue,
The time difference is caused by the different numbers of nuclides loaded during the initilization.
For example, in pure neutron transport, only the nuclides you added in your materials.xml will be loaded. However, in the case of depletion simulation, all nuclides in the chain file with neutron-induced reaction cross section in cross_sections.xml will be loaded. And the cross section lookup (increases with numbers of nuclides) takes significant portion in Monte Carlo particle transport. For sure, you could try the simplified chain file you mentioned with MPI or OpenMP parallel computation to improve the efficiency.
Anyway, I prefer 100 batches and 40,000 particles per batch in this case.
As for the fission-q problem, it might be caused by the mistake in unit (eV). But I am not sure unless you provide more details of the model.
Best,
Jiankai