Hello guys,
I have found a question while I was doing the review questions of this week.
As the video says, if n(features) > 10^4, we would better use gradient descent. Therefore, it seems that we don't have to consider the size of m while choosing gradient descent or the normal equation. Is it true? The formula of the normal equation is "theta = pinv( X' * X ) * X' * Y", and the X's size suppose to be m*(n+1), so I think that we should also consider the size of m. Am I wrong?(The answer on the website says that we only have to consider the size of n not including m)