One cause of slow-down is I/O.
The following changes could reduce time spent in I/O operations:
1. I would switch the SCF from semi-direct to direct by changing the line
semidirect filesize 100000000
to
direct
2. The trajectory file is written by default at every timestep. By changing the value of print_xyz the trajectory file will be written less often.
For example, the following input line will cause the trajectory file being printed every 5 time-steps
qmd
print_xyz 5
...
end
Your memory line seems too large to me. Do you have 3840 (120*32) gigabytes of memory on your system?
(Do you see any swap memory used while your job is running?)
I would change the line to more reasonable values, for example:
memory stack 1000 mb heap 100 mb global 500 mb noverify
One more tip: switching from the default cartesian to spherical basis functions should speed the calculation, too.
basis "ao basis" spherical