
By making the steps small, having a lot of them, we are getting an averaging effect that takes you down to the bottom of the bowl —-Bernard Widrow
If intelligence is a bowl and we are all sliding toward the bottom, what does it mean to stop?
If you haven’t read the first blog, you can find it here. It begins with the ducklings, the perceptron, and the quiet shift from understanding to optimization.
In this post, we continue with Chapters 3 and 4 of Why Machines Learn, where the book starts to draw the “bowl” itself, which is the mathematical landscape of error.
It is here that learning becomes geometry, and intelligence begins to take the shape of a surface we can fall into, step by step, gradient by gradient.
What is the bowl means here?
Firstly, we learn two new concepts here: gradient descent and loss function.
Basically, the bowl represents the shape of the loss function, and the bottom of it is where the loss becomes minimum.
The method we use to find this minimum is gradient descent. It calculates the slope between two nearby points and moves the model slightly in the direction where the slope decreases, slowly descending toward the lowest point.
This matters because a smaller error usually means a better prediction.
But interestingly, when the model reaches the bottom of the bowl, it stops. This leads to the idea of local and global minima.
A local minimum is a point that looks like the bottom from a close view but is not the lowest place overall, while the global minimum is the true lowest point in the entire surface. A model can easily get trapped in a local minimum, thinking it has “learned enough,” when in fact it has only found a comfortable but imperfect answer.
Why don’t we stay in the bottom?
However, in reality, the world is not a perfectly smooth bowl: it is noisy, irregular, and full of uncertainty.
If we keep staying at one point that happens to be false, the results may get worse instead of better.
This is why randomness becomes essential and we introduce it into our model.
Models estimate likelihoods instead of certainties, and randomness helps them escape from the wrong valleys.
The book suggests that uncertainty is not failure but a condition for discovery. Researchers introduced stochastic gradient descent, which uses randomness to explore better paths, whihc is a mathematical form of curiosity in human beings.
Intelligence may live in the tension between the two. Too much order and it freezes; too much randomness and it dissolves. We have to find a balance between stability and exploration.
## Closing
Then though researchers haven’t (and it is a hard question to solve!) to find let the machines learn to find balance within themselves. We begin to think a deeper question:
how do they recognize balance in the world around them?
To see order is one thing; to know why two things belong together is another. Perhaps that is where the next story begins.