Readings of Why Machines Learn (Part 6) – Prompts, Dreams, and Me

I always thought that human engineers would not be smart enough to conceive and design an intelligent machine. It will have to basically design itself through learning.

After experiencing the winter of AI and fixing the logic problem with Backpropagation, we run into two new issues. First, how do we make a machine actually see things (like the internet’s favorite subject: cats)? Second, what happens when we make these machines ridiculously, unnecessarily huge?

Chapters 11 and 12 take us from “Oh, that’s clever design” to “Wait, why does this work?”

Forcing the machine to wear glasses

Chapter 11 is about Convolutional Neural Networks (CNNs), the technology behind almost all modern computer vision.

The author explains how we stopped treating images as random lists of pixels and started respecting their structure. For example, if you connect every pixel to every neuron, the model has to re-learn what a “cat” looks like every time the cat moves one inch to the right. That’s inefficient. So, scientists like Yann LeCun designed convolution: a small filter that slides over the image.

The math here enforces translation invariance. It’s a fancy way of saying: “A cat in the top left is the same as a cat in the bottom right.”

I like this chapter because it feels like engineering. We, the humans, understood the world (objects have edges, shapes don’t change when they move), and we hard-coded that understanding into the network structure. We are essentially forcing the AI to see the world through human-designed glasses. It feels safe. It feels like we are in control.

…And then the curve went weird

Then Chapter 12 arrives and throws all that safety out the window. It talks about Double Descent.

In traditional statistics (and in every Intro to ML class), base on what we have learnt, there is a golden rule: Do not overfit. If your model is too complex, it will memorize the training data and fail on the test data. The error curve should go down, and then shoot up if you keep adding parameters.

But in modern Deep Learning, if you keep making the model way bigger than it has any right to be, the error shoots up… and then, magically, goes down again? (Yes this is shocking to me when I firstly read about it).

Ananthaswamy calls this “Terra Incognita.” It defies classical intuition. It’s like studying for an exam:

Underfitting: You didn’t study. You fail.
Overfitting: You memorized the practice questions but don’t understand the concepts. You fail the real exam.
Double Descent: You memorized the entire textbook, every footnote, and the teacher’s diary. Suddenly… you pass?

This is the “interpolation regime.” The model is so big it fits everything perfectly, yet somehow it smooths out and generalizes well.

Closing

With CNNs, we played the role of the intelligent designer. We crafted the lens, confident that we knew how the machine should see.

But with Double Descent, we stopped designing and started overwhelming. We are succeeding not by following the rules of statistics, but by making models so colossally huge that they simply crush the rules under their own weight.

It leaves me with a lingering question: Are we actually ‘solving’ intelligence, or are we just building a high-dimensional lookup table that is—quite literally—too big to fail? The math is elegant and the output is brilliant, sure. But standing here, watching the error curve dip for reasons we can barely explain, it feels less like we’ve discovered a formula, and more like we’ve successfully cast a spell. Let’s just hope we know the counter-spell.

Now, if you’ll excuse me, I need to go sacrifice another GPU to the AI gods.