Proteins

molecular machines essential to life
many functions
consist of chain of amino acids that fold into a 3D structure
- strings over an alphabet of 20 letters
The exact 3D shape is important for a protein’s function

AlphaFold 3 updates

Changes: accommodate more general chemical structures and to improve the data efficiency of learning
Within the trunk
- MSA processing is substantially de-emphasized with a much smaller and simpler MSA embedding block (Supplementary Methods 3.3). Compared to the original Evoformer from AlphaFold 2 the number of blocks are reduced to four, the processing of the MSA representation uses an inexpensive pair-weighted averaging, and only the pair representation is used for later processing steps.
- The “Pairformer” (Fig. 2a, Supplementary Methods 3.6) replaces the “Evoformer” of AlphaFold 2 as the dominant processing block. It operates only on the pair representation and the single representation; the MSA representation is not retained and all information passes via the pair representation. The pair processing and the number of blocks (48) is largely unchanged from AlphaFold 2.
DiffusionModule
- directly predicts the raw atom coordinates with a Diffusion Module.
- The multiscale nature of the diffusion process (low noise levels induce the network to improve local structure) also allow us to eliminate stereochemical losses and most special handling of bonding patterns in the network, easily accommodating arbitrary chemical components.

Full Form

Diagram

AlphaFold

inductive bias of the modules is important
Physics and geometric insights are built into the network structure
Inductive bias used:
- Sequence positions de-emphasized (i.e. any amino acids can talk to any amino acid because of folding, so original string is not important)
- Instead, residues that are close in space need to communicate
- Iteratively learn a graph of which are residues are close, while reasoning over this implicit graph as it is being built (could be a soft-graph i.e. an attention matrix)

Inputs

Multiple Sequence Alignment (MSA representation)

Use previous knowledge of structures
- for the same “function”, the structure should be fairly similar
- thus, given a sequence, they search for “evolutionary” related sequences for which the structure is known to infer a good starting point of the structure (distances)
- Intuition
  - Co-evolution, residues in contact must mutate together
    - if we see residues mutate, this gives us a prior that they were close
  - Stability: if evolution conserves an amino acid at the same spot throughout multiple sequences, it usually means that this amino acid is on the “outside” or “inside” of the protein. e.g. hydrophilic amino acid tend to stay outside
- How to do it:
  - Given a sequence, do a genetics database look-up
    - gives us a multiple sequence alignment (MSA) (it’s a stack of sequences)

Residue Pairs representation

From the input sequence, we create a pairwise representation of all residues

Architecture

Evoformer

Block

Triangular Attention

Enforcing Euclidean geometry into the network
Given 3 points A,B,C
- If distance AB and distance BC known, strong constraint on AC (triangle inequality)
- $∣∣ x + y ∣∣ \leq ∣∣ x ∣∣ + ∣∣ y ∣∣$
- AC < AB + BC
Pair Embedding encodes spatial relations
- An update for pair AC should depend on AB and BC
Given $x_{ij}$ of the pair representation,
- update it by summing (“weighted with attention”) over the $i^{t h}$ row and the $j^{t h}$ column
- different from any-to-any usual attention

Structure Module

Takes into MSA representation & pair representation ⇒ outputs the 3D structure
They didn’t the protein backbone as a chain
- They treat as a gas of 3-D rigid bodies (only provided rotation and translation of the body)
Use a 3-D equivariant transformer architecture
- builds the sides chains form torsion angles

Recycling

Do 3 forward passes, where the next forward pass is fed the final representations of the previous forward pass (final MSA, final pair representation, structure prediction)
“Iterative refinement”

Noisy Student Distillation

Not enough data to do supervised learning
They iteratively build up a labelled dataset by adding confidently predicted structures from the previously trained model
Kickstart on PDB data ⇒ iteratively enrich dataset
They say it acts as a “regularization”

Outputs

Per-residue confidence (pLDDT)

LDDT = Local Difference Distance Test
Tells us the per-residue error
Train a small head to predict this metric during training (and they have labels from PDB)
Side chains are usually the hardest to be confident on
Pitfalls
- On a domain scale (rigid body), pLDDT is a good metric
- To assess inter-domain confidence (how big bodies flop around with each other), pLDDT is not good because it is a local metric.
- That’s what “Predicted Aligned Error” is for!

Pairwise confidence (Predicted Aligned Error)

Gives us a pairwise matrix of errors
$x_{ij}$ = error for residue $j$ if we are aligned to the reference frame of residue $i$ .

How AlphaFold understands proteins

Computational structure prediction is typically underspecified
- context greatly influences structure
The network implicitly models the missing context
- e.g. presence of zinc atom influencing the side chains

🤖 Harold's Notes

Explorer

AlphaFold

Proteins

AlphaFold 3 updates

Full Form

AlphaFold

Inputs

Multiple Sequence Alignment (MSA representation)

Residue Pairs representation

Architecture

Evoformer

Block

Triangular Attention

Structure Module

Recycling

Noisy Student Distillation

Outputs

Per-residue confidence (pLDDT)

Pairwise confidence (Predicted Aligned Error)

How AlphaFold understands proteins

Graph View

Table of Contents

Backlinks