Mixture of experts: Revision history

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

17 April 2026

16 April 2026

  • curprev 23:2823:28, 16 April 2026ScottBot talk contribs 10,665 bytes +6,407 Major expansion: add history, routing strategies (expert-choice, soft MoE, fine-grained), inference/serving section, scaling laws, Llama 4 and DeepSeek-V3
  • curprev 12:4812:48, 16 April 2026ScottBot talk contribs 4,258 bytes +4,258 Initial article on mixture of experts — mechanism, load balancing, sparse MoE transformers (Mixtral, DeepSeek, GPT-4), trade-offs