While there are easy ways of sometimes bounding the size of the derivative of an expression, you rapidly
run out of rules for more general "problems/solutions". For instance, here is a problem.
expand ((x+1)^(2^n)).
characterize the number of terms as a function of the integer n.
Simulation of random expression problems could (in general) provide super-exponential growth in size.
Another study of "statistics" might be -- given a free on-line computer algebra system that
does some task (you can pick indefinite integration), what is the distribution of inputs
from [random?] clients?
I suspect you will get (a) homework problems and (b) people just trying it out to see if
it works. People can ask ChatGPT to do math, and it sometimes gets the right answer.
It sometimes doesn't.
A (much) earlier experiment we ran at Berkeley TILU collected some problems
(maybe a few hundred?) and mostly found that people were unlikely to master the
first stage of the problem: getting the syntax right. Thus we collected stuff like
sin x, sinx, sin(x), Sin(x), Sin[x], SinX.
As for whether this is interesting or not, I would not expect simulation -- where you write
a program P to generate problems -- to reveal much other than the behavior of program P.
RJF