2024 · Physics

Memory as a landscape: the physics behind machine learning

Q: In a Hopfield network, what does a stored memory correspond to?

A valley, a local minimum, in the energy landscape. Each stored pattern is written into the weights so that it sits at a local minimum of the network's energy. Recall is the network rolling downhill into the nearest such valley.

Q: You feed a Hopfield network a noisy, partial version of a stored pattern. What happens?

It flips nodes to lower its energy until it settles in the closest stored pattern. The update rule only ever lowers the total energy, so the state slides down to the nearest minimum, which is the stored pattern most similar to the noisy clue. That is how the network completes and cleans up the input.

Q: What did Hinton's Boltzmann machine add beyond the Hopfield network?

It used hidden units and statistical physics to learn patterns in data and generate new examples. The Boltzmann machine introduces hidden units and uses the Boltzmann distribution, training the weights until the network's own samples match the data. This lets it discover features on its own and generate new examples, not just store fixed memories.

Awarded to John J. Hopfield and Geoffrey Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks”.

What was the 2024 Nobel Prize in Physics awarded for?

The 2024 Physics prize honours the physics that underpins machine learning. John Hopfield showed that a simple network of connected nodes can store a memory as the low point of an energy landscape and recall it from a noisy or partial clue. Geoffrey Hinton extended that idea into the Boltzmann machine, a network that learns the hidden patterns in data on its own and helped launch today's deep learning.

Predict first

A network is shown a blurry, half-erased photo of a face it has seen before. With no database lookup, it cleans the image up and returns the original. How can a web of simple on/off nodes do that?

It treats memory as a downhill roll. Each stored pattern sits at the bottom of a valley in an energy landscape. The blurry photo starts partway up a slope, and the network keeps flipping nodes to lower its energy, so the state slides down into the nearest valley. The valley it lands in is the stored pattern most similar to the clue, which is the cleaned-up face.

Predict first

You never tell the network what a cat is. You just show it thousands of pictures. Later it can sketch a brand new cat-like image on its own. What did it actually learn?

The statistics of the data, not a list of rules. A Boltzmann machine has hidden nodes that are not pinned to the picture. By adjusting its connection strengths until the patterns it tends to produce match the patterns it was shown, it captures the recurring features of cats. Because those features now live in its weights, it can generate fresh examples that share them.

A memory is a valley in the network's energy landscape. A noisy clue dropped on a slope rolls downhill into the nearest valley, recovering the stored pattern closest to it.

Imagine a hilly landscape with a few deep valleys. Roll a ball anywhere on it and the ball runs downhill until it settles at the bottom of the nearest valley.

John Hopfield showed that a network of tiny connected switches can work the same way. Each thing you want it to remember, like a picture, becomes its own valley. If you then hand the network a smudged or half-missing version of that picture, it acts like the rolling ball. It keeps adjusting its switches to move downhill until it reaches the nearest valley, which is the clean memory. That is how it fills in and completes the clue.

The big idea in one line

Remembering is rolling downhill

A memory is stored as the bottom of a valley. Give the network a noisy hint and it slides down to the closest valley, recalling the full pattern. No searching through a list is needed.

Geoffrey Hinton took this further. He built a network that studies many examples and quietly learns the patterns hiding inside them, so it can even make new examples of its own. These physics ideas about energy and chance are the seeds of the machine learning we use every day.

A Hopfield network is a single layer of nodes where every node connects to every other node. Each node holds a value of on or off, and each connection has a weight. Crucially the weights are symmetric: the link from node i to node j is as strong as the link from j to i.

Hopfield borrowed an idea from the physics of magnets. He gave the whole network a single number, its energy, computed from every node value and every connection strength. Stored patterns are arranged to sit at low-energy points. To recall a memory, the network visits its nodes one at a time and flips any node whose flip would lower the total energy. Step by step the energy falls and the state settles into the nearest low point, the stored pattern closest to the starting clue.

Storing a memory

Hebbian weights carve the valleys

Patterns are written into the weights with a Hebbian rule: nodes that should be on together get a positive connection, nodes that disagree get a negative one. Each stored pattern then becomes a local minimum of the energy, a valley the network can roll into. Add too many patterns and the valleys start to merge, which sets a limit on how much one network can remember.

Hinton asked a different question: instead of storing fixed patterns, could a network learn the structure of data by itself? His Boltzmann machine adds hidden nodes that are not tied to the input. Using the Boltzmann distribution from statistical physics, where low-energy states are the most likely, it adjusts its weights until the patterns it generates on its own match the patterns it was shown. It can then classify images or create new examples of the kind it learned.

1982

Hopfield publishes the associative-memory network, using just 30 nodes and fewer than 500 connections.

1985

Hinton and Terrence Sejnowski introduce the Boltzmann machine, a network that learns from examples.

1986

Hinton and colleagues popularise backpropagation, training deeper networks efficiently.

2012

Hinton's group wins the ImageNet contest with a deep network, sparking the modern boom.

2024

Hopfield and Hinton share the Nobel Prize in Physics.

The Hopfield network is a recurrent system of N binary units s_i taking values -1 or +1, with symmetric weights w_ij = w_ji and no self-connection. Its state is scored by an energy function taken straight from the Ising model of interacting spins: E = -1/2 Σ w_ij s_i s_j. Each asynchronous update sets a unit to the sign of its local field, a step that can never raise E, so the network slides down a fixed energy surface and halts at a local minimum.

The learning rule

Hebbian storage and the 0.138N limit

To store P patterns, the weights are set by the Hebbian outer-product rule w_ij = (1/N) Σ ξ_i ξ_j summed over the patterns. Each pattern becomes an attractor, a basin in the energy landscape. Because spurious minima and pattern overlap accumulate, a standard Hopfield network reliably stores only about 0.138N patterns before recall breaks down, a capacity later lifted enormously by dense associative memories.

Hopfield's contribution reframed memory as a dynamical-systems problem: content-addressable recall becomes gradient descent on E, and robustness to noise is the size of each attractor's basin. The same machinery links neural computation to spin glasses, which is why the prize sits in physics rather than computer science.

Hinton's Boltzmann machine turns the energy idea into a generative model. Units are split into visible v and hidden h, and a configuration has probability p(v,h) = exp(-E(v,h)) / Z, the Boltzmann distribution, with Z the partition function. Training maximises the likelihood of the data; the gradient is a difference between two correlations, one measured while the data is clamped on the visible units and one measured while the model runs freely. This contrast nudges the weights until the model's own samples resemble the training set.

Why it scaled

Restricted Boltzmann machines and deep learning

A general Boltzmann machine trains slowly. Restricting it so hidden units connect only to visible units, with no links inside a layer, gives the restricted Boltzmann machine, whose conditional independence makes learning fast. Stacking these layer by layer, then fine-tuning, was an early recipe for training deep networks and helped ignite the deep learning era.

“Hopfield likened searching the network for a saved state to rolling a ball through a landscape of peaks and valleys, with friction that slows its movement.”The Royal Swedish Academy of Sciences, 2024

Worth knowing

A trillion parameters grew from fewer than 500

Hopfield's original 1982 network had 30 nodes and fewer than 500 connections to adjust, simple enough to run on the computers of the day. The large language models built on the same basic idea now juggle more than a trillion parameters, a jump of more than a billionfold in barely four decades.

Check yourself

In a Hopfield network, what does a stored memory correspond to?

Why: Each stored pattern is written into the weights so that it sits at a local minimum of the network's energy. Recall is the network rolling downhill into the nearest such valley.

You feed a Hopfield network a noisy, partial version of a stored pattern. What happens?

Why: The update rule only ever lowers the total energy, so the state slides down to the nearest minimum, which is the stored pattern most similar to the noisy clue. That is how the network completes and cleans up the input.

What did Hinton's Boltzmann machine add beyond the Hopfield network?

Why: The Boltzmann machine introduces hidden units and uses the Boltzmann distribution, training the weights until the network's own samples match the data. This lets it discover features on its own and generate new examples, not just store fixed memories.

Key terms

Hopfield network: A recurrent network where every node connects to every other with symmetric weights. It stores patterns as low-energy states and recalls them by settling into the nearest one.
Energy function: A single number for the whole network, E = -1/2 times the sum of w_ij s_i s_j over all node pairs, borrowed from the physics of magnetic spins. Stored patterns sit at its lowest points.
Associative memory: A memory addressed by content rather than by location. Given part of a pattern, it returns the complete stored pattern that best matches.
Hebbian learning: A rule for setting connection weights so that nodes which should be active together are linked positively. It carves each stored pattern into a valley of the energy landscape.
Boltzmann machine: A network with hidden units that learns the statistical structure of data using the Boltzmann distribution, and can generate new examples like those it was trained on.
Hidden units: Nodes in a Boltzmann machine that are not fixed to the input. They let the network represent features that are not directly given in the data.

The laureates

John J. Hopfield

Princeton University, Princeton, NJ, USA

Born in Chicago in 1933, Hopfield was a physicist who moved into biology at Caltech. In 1982 he showed that a network of simple connected nodes can act as an associative memory, storing patterns as low points of an energy landscape and recalling them from noisy or partial clues.

Photo: bhadeshia123, CC BY 3.0 (via Wikimedia Commons)

Geoffrey Hinton

University of Toronto, Toronto, Canada

Born in London in 1947, Hinton is a computer scientist long based at the University of Toronto. He built the Boltzmann machine on top of Hopfield's idea, using statistical physics to let a network learn the structure of data by itself, work that helped start the modern growth of machine learning.

Photo: Cmichel67, CC BY-SA 4.0 (via Wikimedia Commons)

Sources

Facts are pinned from the official Nobel Prize API. The explanations were written from these sources:

Your notessaved

← Back to all prizes