AlphaFold
AlphaFold is a deep-learning system developed by Google DeepMind that predicts the three-dimensional structure of proteins from their amino-acid sequence. Its second version, AlphaFold 2, first demonstrated in late 2020, produced predictions for most proteins at accuracy approaching that of experimental methods such as X-ray crystallography and cryo-electron microscopy. This was widely regarded as a solution — or near-solution — to the protein folding problem, a fifty-year-old grand challenge of structural biology.[1][2]
In October 2024, Demis Hassabis and John Jumper shared half of the Nobel Prize in Chemistry "for protein structure prediction" using AlphaFold, with the other half awarded to David Baker for computational protein design.[3]
History
CASP and the protein folding problem
Biennial assessments of protein-structure prediction methods have been run since 1994 under the Critical Assessment of protein Structure Prediction (CASP) community experiment, in which groups predict the structure of proteins whose experimental structures are known but unpublished.[4] Prior to AlphaFold, no method had achieved median global-distance-test (GDT_TS) scores reliably above roughly 40 on the hardest free-modelling targets; a GDT_TS of 90 is considered competitive with experiment.
AlphaFold 1 (CASP13, 2018)
DeepMind entered CASP13 in December 2018 under the name "A7D", winning the free-modelling category with a median GDT_TS of about 58.[5] The first AlphaFold used a deep residual network to predict distance and torsion-angle distributions between residue pairs from a multiple sequence alignment, which were then combined into a differentiable potential that was minimised by gradient descent. Although it did not solve the problem, it produced an approximately two-fold improvement over the next-best method.
AlphaFold 2 (CASP14, 2020)
At CASP14 in November 2020, an essentially new system called AlphaFold 2 achieved a median GDT_TS of 92.4 across all targets, a result the organisers described as having "largely solved" the single-domain structure prediction problem.[2] The full method was published in Nature in July 2021,[1] simultaneously with the release of open-source code under an Apache 2.0 licence on GitHub.
AlphaFold Protein Structure Database
Also in July 2021, DeepMind and the EMBL-EBI launched the AlphaFold Protein Structure Database, initially containing about 365,000 predictions including the entire human proteome.[6] A 2022 update expanded the database to over 200 million predicted structures covering nearly every catalogued organism in UniProt.
AlphaFold-Multimer (2021)
In October 2021, DeepMind released AlphaFold-Multimer, an extension trained to predict the structures of protein complexes with multiple chains.[7]
AlphaFold 3 (2024)
In May 2024, Isomorphic Labs and Google DeepMind published AlphaFold 3, which generalises the approach to complexes involving ligands, nucleic acids (DNA and RNA), ions and common post-translational modifications.[8] AlphaFold 3 replaces the AlphaFold 2 structure module with a diffusion-based generative process and, at launch, was accessible only through a web-based AlphaFold Server with usage limits, drawing criticism from parts of the scientific community over the reduced reproducibility compared with AlphaFold 2's full code release.[9] Inference code and weights for non-commercial use were released in November 2024.
Architecture
AlphaFold 2 takes as input a target amino-acid sequence and two derived objects built from database searches: a multiple sequence alignment (MSA) of evolutionarily related sequences, and a set of candidate "templates" — structurally similar proteins from the Protein Data Bank. These are processed by two main neural-network components.
Evoformer
The Evoformer is a 48-block transformer-style trunk that jointly refines two representations: an MSA representation of shape (sequences × residues × channels) and a pair representation of shape (residues × residues × channels).[1] Custom attention mechanisms operate along each MSA axis and along each pair axis, with information exchanged between the two representations by "outer-product mean" and "bias" updates. The pair representation can be interpreted as a graph of residue–residue relationships, with triangle-multiplicative and triangle-attention updates enforcing geometric consistency analogous to the triangle inequality.
Structure module
The structure module converts the refined pair and single representations into explicit 3-D atomic coordinates. Each residue is represented as an independent rigid body (the backbone N–Cα–C frame) together with a set of torsion angles for side chains. Invariant point attention (IPA) — an attention operation that is equivariant under rigid-body transformations of the inputs — updates these frames iteratively. The module is run for eight recycling iterations, and its outputs are also fed back into the Evoformer.
Confidence estimates
AlphaFold 2 emits two confidence measures. The predicted local distance difference test (pLDDT) is a per-residue score between 0 and 100 that correlates strongly with the true lDDT-Cα against experimental structures; values above 90 indicate highly accurate backbone and side-chain placement, while values below 50 should be interpreted as a prediction of disorder.[1] The predicted aligned error (PAE) is a per-residue-pair matrix useful for assessing relative domain orientation.
Training
AlphaFold 2 was trained on about 170,000 experimentally determined structures from the Protein Data Bank, augmented with self-distillation on predictions for roughly 350,000 unlabelled sequences from UniClust. Training ran for about 11 days on 128 TPU v3 cores.[1]
Reception and impact
Scientific impact
By early 2024, the Jumper et al. 2021 Nature paper had accumulated over 25,000 citations, making it one of the most-cited papers in biology of the decade. AlphaFold predictions are routinely used as starting models for molecular replacement in X-ray crystallography, as priors in cryo-EM density interpretation, and as inputs to downstream tasks such as docking, protein design and virtual screening.
Uses of the AlphaFold database have been reported in studies of the structure of the nuclear pore complex,[10] the identification of new antibiotic candidates, and in the annotation of the so-called "dark proteome" — proteins without experimental structures or close homologues.
2024 Nobel Prize in Chemistry
On 9 October 2024, the Royal Swedish Academy of Sciences awarded one half of the Nobel Prize in Chemistry jointly to Demis Hassabis and John Jumper "for protein structure prediction", citing AlphaFold 2 specifically.[3] The other half went to David Baker of the University of Washington for his work on computational protein design using Rosetta and, later, the RoseTTAFold and RFdiffusion systems.
Criticism
Criticism of AlphaFold has focused on several points. First, the system predicts a single static structure per input and does not natively model conformational ensembles, allostery, or the effect of point mutations on stability, although subsequent work has adapted it to these tasks. Second, accuracy for intrinsically disordered proteins, antibodies, de novo-designed proteins, and large multi-domain complexes is substantially lower than the headline CASP14 figures. Third, the release model of AlphaFold 3 — initially a web server with usage caps, without immediate code release — was seen by some researchers as a departure from AlphaFold 2's open-science precedent.[9]
See also
- RoseTTAFold
- ESMFold
- Protein Data Bank
- Deep learning
- Attention (machine learning)
- Transformer (machine learning)
References
- ↑ 1.0 1.1 1.2 1.3 1.4 Jumper, J. et al. (2021). "Highly accurate protein structure prediction with AlphaFold." Nature 596, 583–589. doi:10.1038/s41586-021-03819-2.
- ↑ 2.0 2.1 Kryshtafovych, A. et al. (2021). "Critical assessment of methods of protein structure prediction (CASP)—Round XIV." Proteins 89, 1607–1617. doi:10.1002/prot.26237.
- ↑ 3.0 3.1 Royal Swedish Academy of Sciences (9 October 2024). "The Nobel Prize in Chemistry 2024." Press release.
- ↑ Moult, J. et al. (1995). "A large-scale experiment to assess protein structure prediction methods." Proteins 23, ii–v.
- ↑ Senior, A. W. et al. (2020). "Improved protein structure prediction using potentials from deep learning." Nature 577, 706–710. doi:10.1038/s41586-019-1923-7.
- ↑ Varadi, M. et al. (2022). "AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models." Nucleic Acids Research 50, D439–D444. doi:10.1093/nar/gkab1061.
- ↑ Evans, R. et al. (2021). "Protein complex prediction with AlphaFold-Multimer." bioRxiv 2021.10.04.463034. doi:10.1101/2021.10.04.463034.
- ↑ Abramson, J. et al. (2024). "Accurate structure prediction of biomolecular interactions with AlphaFold 3." Nature 630, 493–500. doi:10.1038/s41586-024-07487-w.
- ↑ 9.0 9.1 Callaway, E. (14 May 2024). "Major AlphaFold upgrade offers boost for drug discovery." Nature 629, 509–510. doi:10.1038/d41586-024-01383-z.
- ↑ Mosalaganti, S. et al. (2022). "AI-based structure prediction empowers integrative structural analysis of human nuclear pores." Science 376, eabm9506. doi:10.1126/science.abm9506.