Version 11.6.0 of NTL now utilizes a CLMUL instruction on Aarch64 to implement GF2X arithmetic.
Because of this, NTL's "built in" GF2X arithmetic is much faster than NTL with the external GF2X library (which does not currently utilize a CLMUL instruction on Aarch64).
Om my Apple M1 Max MacBook, NTL built-in GF2X is 9x faster for degree-1000 poly mul, and 5x faster for degree-1000000 poly mul, than NTL+GF2X-library.