I think I see things slightly clearer now. One problem is, that, by design, we successively reduce to polynomial rings with fewer generators. More precisely, MPolynomial.gcd converts the input to a univariate polynomial in one of the generators, and UniqueFactorizationDomains.ParentMethods._gcd_univariate_polynomial then relies on the existence of a gcd for the new base ring.
To do this, the new base ring, i.e., the ring with one fewer generator, must be constructed, and this is rather expensive. If I understand correctly, almost all the time is spent there. This is done in `MPolynomial.polynomial` using `S = PolynomialRing(R.base_ring(), Z)`
So, the questions are:
1. is there a (much) faster way to construct the univariate polynomial ring? I tried to do S = R.__class__(R.base_ring(), len(Z), [str(z) for z in Z], R.term_order()) when Z has at least two variables (because of the difference between univariate and multivariate polynomial rings) but to my surprise this was actually slower.
2. can we bypass the construction of polynomial rings entirely?