Readings of Why Machines Learn (Part 5) – Prompts, Dreams, and Me

It was just kind of by analogy: ‘Since we proved the simple nets can’t do it, forget it.’

In the previous chapters, we looked at how we can manipulate space (kernels) and borrow laws from physics (energy landscapes) to make machines learn. Everything seemed elegant. But history, apparently, is not a straight line.

Chapters 9 and 10 recount the most dramatic oscillation in AI history: the “Winter” caused by a mathematical proof, and the renaissance driven by a simple application of calculus. But are the “winter” really a terrible thing? How the nerual network reborn? Let’s see together.

Being right, but at what cost?

Chapter 9 introduces Marvin Minsky and Seymour Papert. In 1969, they published Perceptrons. Their main point was mathematically undeniable: a simple, single-layer neural network cannot solve the XOR problem.

In plain English: a simple network can’t understand that “A or B, but not both” is different from “A and B.” It’s a basic logical contradiction, and the early AI couldn’t handle it.

Minsky didn’t just point this out; he proved it with such rigorous authority that funding for neural networks basically evaporated. We call this the “AI Winter,” which makes it sound like a Game of Thrones season, but in reality, it was just a lot of researchers losing their beliefs on the topics that neural network could actually make sense in the future, same as Perpetual motion.

But just as author said in the book, it is not the fault of Minsky, he just pointed out that the simple machines cannot solve the problem at that point. He proved the limitation of simple networks, but people took it as a limitation of all networks.

The art of assigning blame

So, how did we thaw the winter? Chapter 10 introduces the solution: adding hidden layers and using Backpropagation.

The problem with hidden layers used to be: if the answer is wrong, who is responsible? Which neuron in the messy middle caused the error? Backpropagation answers this using the Chain Rule from calculus. It’s essentially a mathematical way of “passing the buck.” (Honestly when I learn chain rules in my calculus class I never seen such huge power of it :D)

This, to me, is the most profound shift in the book so far. Backpropagation allows the system to look at the final error and mathematically distribute the “blame” backward through the layers. It asks: “How much did your weight contribute to this mistake?” and adjusts it accordingly. The system looks at the final error and says, “Okay, we were off by 5. Layer 3 contributed 20% to this mess, so adjust them a lot. Layer 2 only contributed 1%, so leave them alone.”

It turns out that “learning” is just the ability to efficiently distribute blame.

Once we figured out how to send the error signal backward through the layers, the “XOR problem” became trivial. The machine didn’t need to understand logic; it just needed to be scolded precisely enough, millions of times, until it stopped making mistakes.

Closing

These two chapters taught me that mathematical rigour is a double-edged sword. Minsky’s rigour froze the field because it demanded perfection too early. The solution wasn’t to find a flawless logic, but to accept a messy system that corrects itself, little by little.

It mirrors our own lives. We often get so obsessed with the ‘proper’ way to solve a problem—getting stuck in the messy middle—that we forget our original destination. Minsky stopped because the path wasn’t perfect. But sometimes, if the math says the door is locked, the answer isn’t to study the lock; it’s to change the route, or better yet, simply kick the door open.

But there is a cost to this shift. In the era of the perceptron, we knew exactly why the machine failed. In the era of deep learning and backpropagation, the machine succeeds, but the rationale is buried in millions of tiny adjustments to millions of weights. We have traded clarity for capability.

So now we have a powerful machine, but it’s a black box. What do we do? Naturally, we try to control it. In the next post (Chapters 11-12), we’ll discuss how engineers tried to force these networks to “see” the world like humans do.