I don't recall exactly what was intended with that comment. I don't
think that it refers to GOMAXPROCS since that is an env var. I suspect
that it is referring to the more general idea that parallelising a code
can often be as simple as adding a small amount of logic, bearing in
mind that the article is written at least part as a comparison with
python and MATLAB where doing this is much harder.
The second question is easier; for the most part, parallelisation is
under the control of the user. There are some packages where the
functions do do work in parallel. The generalised matrix multiply is an
example of this. Routines in diff/fd, integrate/quad and optimize all
do work concurrently, given a possibility for parts to be done in
parallel.
It is not the case that Gonum always uses bindings to a C-backed
BLAS/LAPACK, although this can be done. By default all Gonum code is
pure Go/ASM. If you do choose to use the OpenBLAS implementations for
BLAS and LAPACK, then you will get that package's parallelisation.
Dan