The latter reduction is more like what you'd get with laziness.
Neither of the "a"s is distinguished, though; they're just the same
value. It's more like "let a = 4*4 in a*a" ==> "let a = 16 in a*a" ==>
"256".
Cale has a nice explanation of different evaluation schemes:
<
http://www.reddit.com/r/haskell/comments/ywula/yet_another_reason_not_to_be_lazy_or_imperative/c5znykk>.
The most useful way to think about this, unless you have a reason to
care about low-level details, isn't as "thunks" but as sharing nodes
in a graph that's being reduced. See e.g.
<
http://www.vex.net/~trebla/haskell/lazy.xhtml>. This is enough to
reason about the behavior of most lazily-evaluated programs, and is
much higher-level.
Note that, although every practical implementation uses it, the
Haskell Report doesn't specify lazy evaluation explicitly, just
non-strict semantics. You could write a compliant Haskell
implementation that uses call-by-name, because at the end of a
reduction call-by-name and laziness both produce the same results.
The monomorphism restriction is a mostly unrelated issue -- it comes
from the fact that you might write something like "let x :: Num a =>
a; x = 5*5 in ...", in which case the computed value 25 *won't* get
shared between uses of x (because x gets compiled into a function, not
a numeric value). So with the MR, when you write "let x = 5*5 in ...",
its type will be required to be concrete.
Shachaf