I guess my first question is what is the breakdown of the results that you're processing from the 3 models. If you run `test_op2 -c model.op2, you'll see:
DEBUG: op2.py:530 -------- reading op2 with read_mode=1 (array sizing) -----
INFO: op2_scalar.py:1305 op2_filename = 'loadstep_elements.op2'
DEBUG: op2_scalar.py:1536 table_name='GEOM1S'
DEBUG: op2_scalar.py:1536 table_name='GEOM2S'
DEBUG: op2_scalar.py:1536 table_name='GEOM3S'
DEBUG: op2_scalar.py:1536 table_name='GEOM4S'
DEBUG: op2_scalar.py:1536 table_name='EPTS'
DEBUG: op2_scalar.py:1536 table_name='MPTS'
DEBUG: op2_scalar.py:1536 table_name='DIT'
DEBUG: op2_scalar.py:1536 table_name='OESCP'
DEBUG: op2.py:542 -------- reading op2 with read_mode=2 (array filling) ----
DEBUG: op2_scalar.py:1536 table_name='GEOM1S'
DEBUG: op2_scalar.py:1536 table_name='GEOM2S'
DEBUG: op2_scalar.py:1536 table_name='GEOM3S'
DEBUG: op2_scalar.py:1536 table_name='GEOM4S'
DEBUG: op2_scalar.py:1536 table_name='EPTS'
DEBUG: op2_scalar.py:1536 table_name='MPTS'
DEBUG: op2_scalar.py:1536 table_name='OESCP'
DEBUG: op2.py:750 combine_results
DEBUG: op2.py:550 finished reading op2
displacements[1]
isubcase = 1
type=RealDisplacementArray ntimes=4 nnodes=43, table_name=OUGV1
data: [t1, t2, t3, r1, r2, r3] shape=[4, 43, 6] dtype=float32
node_gridtype.shape = (43, 2)
sort1
lftsfqs = [ 0.25 0.5 0.75 1. ]
crod_stress[1]
type=RealRodStressArray ntimes=4 nelements=2
data: [ntimes, nnodes, 2] where 2=[axial, torsion]
data.shape = (4, 2, 2)
element.shape = (2,)
element name: CROD
sort1
load_steps = [ 0.25 0.5 0.75 1. ]
etc.
Without that, I'm be guessing. Even then, I need an example. SOL 109 is not used as much, so while you get a numpy array, the read method might not be vectorized.
It's just very suspicious that on the same geometry, there is such a discrepancy in the read rates. The vectorized OP2 is ~500x faster than a non-vectorized read, so seeing a 10x slowdown indicates you have:
- 1+ non-trivial, non-dominating sized results that's aren't vectorized
- you are suppressing 95% of your elements in the SOL 109 case
- you just have a very small model relative to the number of time steps
The 300,000 time step loop is certainly not helping, but I'm not sure that the most important problem to solve. It certainly doesn't have the easiest solution. That issue is your data wants to be in SORT2 format, but you're reading it as SORT1. pyNastran doesn't support SORT2 well and neither do most MSC and NX Nastran (MSC is better though), so despite you requesting SORT2, it gives you SORT1.
It's probably possible to speed up the 300,000 time step loop, but at a significant memory cost. You could stack the data, cast it all at once, and then reshape it, but it's a major change to the way the code works internally.