As you observe, the continuous relaxation is quickly solved to give an upper bound of 3487.4. Other initial root-node processing (starting solution, presolve, probing, symmetry detection) also appears to be not very time-consuming. After about 10 minutes, CPLEX's heuristics find an integer-feasible solution with objective value 3166, giving an optimality gap of 10.15%.
CPLEX then turns to cut generation. It reports generating 6889 additional cut candidates, but after another 80+ minutes of work, these only decrease the gap marginally to 10.09%. At that point, since you specified the CPLEX option nodes=0, the run is terminated.
Since cut generation does not seem to be helping much, I would next try adding cutpass=-1 to turn off cuts entirely, while dropping nodes=0 so that CPLEX continues with the branching phase after root-node processing is finished. Hopefully the time to find an integer-feasible solution will remain at about 10 minutes, but if not then you can allow limited cut generation with cutpass=1, 2, etc. After this, you may want to compare performance using different mipemphasis values. (See
CPLEX Options for AMPL for details.)
After the branching phase has been running for more time than the root-node processing, you should start to see some substantial parallelism. Branching search is not easy to parallelize, however; there is a nontrivial amount of communication overhead, which increases with the number of threads. As a result, you can expect that there will some maximal number of useful threads, above which additional threads fail to reduce total run time. (In fact, if IBM's comments on the
threads parameter are correct, CPLEX should never use more than 32 threads.) The actual maximum useful threads is problem-dependent, however, and so can only be determined (approximately) by experimenting with different values of the threads option.