A quick look at your programs suggests that you may be doing about as well as is possible. One thing you might try, though I'm not sure it will in fact make a difference, is (at least for your piston program) to do what is done in
which is to do all the calculations with ordinary vectors rather than with sphere.pos vectors, as the latter cost extra time when you assign to them, because these "attribute" vectors have to set a flag that the sphere object data has changed (this is used by the 3D render machinery). I'm guessing you won't see a significant difference, because the flag-setting operation is pretty fast.
Incidentally, in a copy of the gas program I timed the animation of 1000 spheres, with rate(150000) and found that each loop cost about 135 ms.
Next I ran the same program using true Python (VPython 7) and found that each loop cost about 500 ms. This is presumably due to the fact that JavaScript (to which GlowScript VPython compiles) is inherently faster than Python (Python is interpreted, JavaScript is compiiled).
Then I did the same with the gas program from the suite of programs distributed with Classic VPython, which used numpy for speed. Running in Classic VPython, a loop cost about 1000 ms. Running in VPython 7 a loop cost about 800 ms. I'm deeply puzzled why the numpy version is slower in VPython 7 (800 ms) than the non-numpy version (500 ms); this doesn't make sense. But whatever mistake I've made, it's clear that for multiparticle VPython programs GlowScript VPython is significantly faster than VPython that uses installed Python.
Because rendering is done in the GPU, not the CPU, changing lighting etc. is unlikely to have an effect. The number of particles does matter, though, because the CPU has to send all the particle data to the GPU.