Proteins: reading nature's shapes and writing new ones
Awarded to David Baker, Demis Hassabis and John Jumper “for computational protein design · for protein structure prediction”.
What was the 2024 Nobel Prize in Chemistry awarded for?
The 2024 Chemistry prize is about the shapes of proteins. Demis Hassabis and John Jumper built an AI called AlphaFold2 that predicts a protein's 3D shape from its chain of amino acids, cracking a problem that had stood for 50 years. David Baker did the reverse: he designs brand-new proteins that have never existed in nature.
You are handed the exact order of amino acids in a protein you have never seen. With plenty of computing power, why was it so hard for 50 years to work out the 3D shape it folds into?
AlphaFold2 predicts the shape of a natural protein from its sequence. David Baker won the other half of the prize for doing something different. What was it?
A protein is a tiny machine inside every living thing. It begins as a long chain of small parts called amino acids, strung together like beads on a thread. That thread folds up into one exact 3D shape, and the shape is what lets the protein do its job.
Scientists could easily read the order of the beads, but for 50 years they could not work out what shape the thread would fold into. A chain can fold in so many ways that trying them all would take longer than the universe has existed. This puzzle was called the protein folding problem.
Read the chain, see the shape
Demis Hassabis and John Jumper built an AI called AlphaFold2 that reads the chain of amino acids and predicts the folded 3D shape in minutes, with accuracy close to slow laboratory experiments.
David Baker went the other way. Instead of predicting the shape of a natural protein, he designs a shape he wants, then finds a brand-new chain that folds into it. These are proteins that have never existed in nature.
A protein is a chain of amino acids drawn from an alphabet of 20. The order of those amino acids is written in a gene, and once the chain is made it folds, usually on its own, into a particular 3D structure. That structure decides the function. In 1972 Christian Anfinsen argued that the sequence alone should hold all the information needed to specify the fold, an idea now known as Anfinsen's dogma.
If the sequence sets the structure, you ought to be able to compute the structure from the sequence. The trouble is scale. In 1969 Cyrus Levinthal pointed out that a typical chain has an astronomical number of possible shapes, so many that sampling them one at a time would take longer than the age of the universe. Yet real proteins fold in fractions of a second. Predicting the final shape from sequence became the protein folding problem, and progress was scored every two years at a blind contest called CASP.
From good to near-experimental
Before deep learning, the best CASP predictions reached roughly 40 percent accuracy. DeepMind's first AlphaFold lifted that to about 60 percent at CASP13 in 2018. Two years later AlphaFold2 scored above 90 on the global distance test for about two-thirds of the hardest targets, good enough to rival experiment.
Accuracy at CASP climbs with AI
Approximate score on the hardest targets, where 100 is a perfect match to the experimental structure.
David Baker attacked the mirror-image problem. His program Rosetta searches for the lowest-energy fold of a sequence, and he realised the same machinery could run backward: choose a structure, then design a sequence to fit it. In 2003 his team built Top7, a 93-amino-acid protein with a fold unlike anything in nature, and its measured X-ray structure matched the design closely. Baker released Rosetta openly, and his group has since designed proteins that serve as vaccines, medicines, sensors, and nanomaterials.
The scientific premise is Anfinsen's thermodynamic hypothesis: for most small proteins the native fold is the global minimum of free energy, so the amino acid sequence encodes the structure. That is what makes prediction well posed in principle. It also dissolves Levinthal's paradox in practice, because predicting the endpoint of folding does not require simulating the kinetic pathway through the conformational landscape. The two halves of the 2024 prize sit on this premise from opposite directions: read structure from sequence, or write a sequence for a chosen structure.
Why brute force fails
Treat a 100-residue chain with only three states per residue and you already face 3^100, near 5 x 10^47 conformations. At 10^-13 seconds per sample, exhaustive search runs to roughly 10^27 years, far beyond the age of the universe, yet proteins fold in milliseconds. Structure prediction therefore cannot enumerate. It must infer.
AlphaFold2 infers by learning from evolution. It takes a multiple sequence alignment of related proteins plus a residue-residue pair representation and passes them through the Evoformer, a stack of attention blocks that lets the alignment and the pair map exchange information repeatedly. Coevolving residues, ones that mutate together across species, signal physical contact in the fold. A final structure module then places every atom directly, and the whole network is trained end to end from sequence to coordinates. At CASP14 in November 2020 this reached a median backbone accuracy around 2.1 angstroms and topped 90 on the global distance test for most targets, the level organisers treated as comparable to experiment.
Two hundred million structures
Hassabis and Jumper used AlphaFold2 to predict the structure of nearly every protein in the human body, then virtually all of the roughly 200 million proteins catalogued across known organisms, releasing them in an open database. More than two million researchers from 190 countries have since used the system, applying it to questions from antibiotic resistance to plastic-degrading enzymes.
David Baker's Rosetta works the energy landscape directly, scoring candidate structures with a physically grounded function and sampling toward the lowest-energy fold. Inverting that search turns prediction into design: fix a target backbone, then search sequence space for amino acids that make that backbone the energy minimum. The 2003 protein Top7, 93 residues with a topology absent from nature, validated the idea when its crystal structure matched the model to about 1.2 angstroms. Baker open-sourced Rosetta, and the field has since folded deep learning into design through tools such as RoseTTAFold, yielding designed binders, enzymes, nanomaterials, and SKYCovione, the first de novo protein approved as a human medicine.
A job that took years now takes minutes
Determining one protein's structure in the lab could cost a graduate student years of work. After confirming that AlphaFold2 worked, Hassabis and Jumper predicted the shapes of nearly all 200 million proteins known to science and released them freely, a near-complete atlas of life's molecules built in a fraction of the time.
Check yourself
What does AlphaFold2 take as its input, and what does it produce?
Why is the protein folding problem so hard to solve by brute-force calculation?
What was new about David Baker's 2003 protein Top7?
Key terms
- Amino acid
- One of the 20 small building blocks that link in a chain to form a protein. Their order sets how the chain folds.
- Protein folding
- The process by which a chain of amino acids settles into its specific 3D shape, which in turn determines what the protein does.
- Protein folding problem
- The decades-old challenge of predicting a protein's 3D structure from its amino acid sequence alone.
- Anfinsen's dogma
- The principle, from Christian Anfinsen's 1972 Nobel work, that a protein's amino acid sequence holds all the information needed to specify its folded structure.
- CASP
- Critical Assessment of protein Structure Prediction, a blind contest held every two years since 1994 that scores how well methods predict structures not yet made public.
- AlphaFold2
- The deep-learning system from Hassabis and Jumper that predicts protein structure from sequence, reaching near-experimental accuracy at CASP14 in 2020.
- De novo protein design
- Building a protein from scratch by choosing a target shape and finding a new amino acid sequence that folds into it, as Baker did with Top7.
- Rosetta
- David Baker's software that scores and searches protein structures by energy, used both to predict folds and to design entirely new proteins.
The laureates
Born in 1962 in the USA, Baker built the Rosetta software at the University of Washington to predict how proteins fold. He then ran the idea backward to design proteins from scratch. In 2003 his team made Top7, the first protein with a fold absent from nature, and he released Rosetta openly so a global community could keep building on it.
Born in 1976 in the United Kingdom, Hassabis founded the AI lab DeepMind and entered the CASP structure-prediction contest in 2018. He co-led the team that built AlphaFold2, the model that predicts a protein's 3D structure from its amino acid sequence with accuracy close to laboratory experiment.
Born in 1985 in the USA, Jumper trained in theoretical physics and developed faster ways to simulate protein dynamics before joining DeepMind. His ideas reshaped the AI model, and he co-led the work with Hassabis that turned AlphaFold2 into a near-experimental structure predictor.
Sources
Facts are pinned from the official Nobel Prize API. The explanations were written from these sources: