This is a good question and I can see this going in a few directions. First, I think the overall goal is to avoid as much API bloat as possible. NumPy is pretty big, and then when you add some routines in SciPy that could arguably fit into the NumPy API, it's massive. I think in general the Nx API should be limited, but not at a cost of productivity. It would be pretty painful to have to reimplement beta, bernoulli, gamma, etc. every time you needed them.
So then this can go in two directions:
1) We implement some common distributions, probably under an `Nx.Random` namespace. The good thing is that all of these functions can be implemented in terms of other Nx primitives, so compiler writers wouldn't need to implement these routines. Another positive to this approach is that it would be easier (I think) to shell out to custom kernels if need be. I know JAX uses some custom CUDA PRNG kernels. XLA also has some routines specifically for PRNGs.
2) We shell this out to a library of numerical definitions. This avoids adding anything to the API, but anytime somebody wants to use common RNGs they have to bring in another dependency (maybe not that bad). This keeps the API slim, but might be kind of annoying from a productivity perspective. You also lose out on the ability to (easily) plug custom kernels where necessary.
We could do something in between and add primitives for making PRNG Keys and some other random primitives. I could see stuff like `jax.random.shuffle` being annoying to implement as a numerical definition. I'm kind of in between both of these choices. I'd be curious to hear other opinions.
Sean