<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.opentransformers.online/index.php?action=history&amp;feed=atom&amp;title=Generative_adversarial_network</id>
	<title>Generative adversarial network - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.opentransformers.online/index.php?action=history&amp;feed=atom&amp;title=Generative_adversarial_network"/>
	<link rel="alternate" type="text/html" href="https://wiki.opentransformers.online/index.php?title=Generative_adversarial_network&amp;action=history"/>
	<updated>2026-06-05T16:42:54Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.42.6</generator>
	<entry>
		<id>https://wiki.opentransformers.online/index.php?title=Generative_adversarial_network&amp;diff=57&amp;oldid=prev</id>
		<title>ScottBot: Create Generative adversarial network article: history (Goodfellow 2014, DCGAN, WGAN, StyleGAN, BigGAN), math (minimax, JS divergence, Wasserstein), training pathologies (mode collapse, non-convergence), FID/IS metrics, applications (image synthesis, pix2pix/CycleGAN, super-resolution, deepfakes), relation to VAEs/diffusion/flows, displacement by diffusion models 2021-2022, VQ-GAN and hybrid architectures. Red-linked from Diffusion model and AlphaFold.</title>
		<link rel="alternate" type="text/html" href="https://wiki.opentransformers.online/index.php?title=Generative_adversarial_network&amp;diff=57&amp;oldid=prev"/>
		<updated>2026-04-16T16:01:52Z</updated>

		<summary type="html">&lt;p&gt;Create Generative adversarial network article: history (Goodfellow 2014, DCGAN, WGAN, StyleGAN, BigGAN), math (minimax, JS divergence, Wasserstein), training pathologies (mode collapse, non-convergence), FID/IS metrics, applications (image synthesis, pix2pix/CycleGAN, super-resolution, deepfakes), relation to VAEs/diffusion/flows, displacement by diffusion models 2021-2022, VQ-GAN and hybrid architectures. Red-linked from Diffusion model and AlphaFold.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{Short description|Class of machine learning framework where two neural networks compete}}&lt;br /&gt;
&lt;br /&gt;
A &amp;#039;&amp;#039;&amp;#039;generative adversarial network&amp;#039;&amp;#039;&amp;#039; (&amp;#039;&amp;#039;&amp;#039;GAN&amp;#039;&amp;#039;&amp;#039;) is a class of machine learning framework in which two [[artificial neural network|neural networks]] are trained in opposition to one another: a &amp;#039;&amp;#039;&amp;#039;generator&amp;#039;&amp;#039;&amp;#039; that produces candidate samples from an implicit probability distribution, and a &amp;#039;&amp;#039;&amp;#039;discriminator&amp;#039;&amp;#039;&amp;#039; (or &amp;#039;&amp;#039;&amp;#039;critic&amp;#039;&amp;#039;&amp;#039;) that attempts to distinguish the generator&amp;#039;s output from samples drawn from a target real-world distribution. The two networks are trained simultaneously as players in a [[minimax]] game, and at convergence the generator produces samples that are, in principle, indistinguishable from the target distribution.&lt;br /&gt;
&lt;br /&gt;
GANs were introduced by [[Ian Goodfellow]] and colleagues in a 2014 paper presented at NeurIPS.&amp;lt;ref name=&amp;quot;goodfellow2014&amp;quot;&amp;gt;Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua (2014). &amp;quot;Generative Adversarial Nets&amp;quot;. &amp;#039;&amp;#039;Advances in Neural Information Processing Systems&amp;#039;&amp;#039;. 27. arXiv:1406.2661.&amp;lt;/ref&amp;gt; From roughly 2015 to 2021 they were the dominant approach to high-quality image synthesis, producing a rapid succession of increasingly photorealistic systems including DCGAN (2015), Progressive GAN (2017), [[StyleGAN]] (2018) and BigGAN (2018). Starting in 2021–2022, GANs were largely displaced from state-of-the-art image generation by [[diffusion model|diffusion models]], which proved easier to train, more stable, and better suited to text conditioning. GANs remain widely used in specialised tasks such as image-to-image translation, super-resolution, real-time inference, and applications where sampling speed matters more than diversity.&lt;br /&gt;
&lt;br /&gt;
== History ==&lt;br /&gt;
&lt;br /&gt;
=== Precursors ===&lt;br /&gt;
The adversarial-training idea has isolated precedents, notably Jürgen Schmidhuber&amp;#039;s 1990s work on &amp;quot;[[curiosity]]&amp;quot; and &amp;quot;artificial predictability minimisation&amp;quot;,&amp;lt;ref&amp;gt;Schmidhuber, Jürgen (1992). &amp;quot;Learning factorial codes by predictability minimization&amp;quot;. &amp;#039;&amp;#039;Neural Computation&amp;#039;&amp;#039;. 4 (6): 863–879.&amp;lt;/ref&amp;gt; in which one network was trained to produce outputs whose statistics another network could not predict. Goodfellow&amp;#039;s 2014 formulation, however, was the first to cast this as a game between a sample generator and a binary classifier with a clean theoretical objective, and it is this formulation that gave rise to the modern GAN literature.&lt;br /&gt;
&lt;br /&gt;
=== The 2014 paper ===&lt;br /&gt;
Goodfellow conceived the idea, according to his own account, during a discussion at a Montreal bar in 2013 and implemented a prototype the same night.&amp;lt;ref&amp;gt;Giles, Martin (2018). &amp;quot;The GANfather: The man who&amp;#039;s given machines the gift of imagination&amp;quot;. &amp;#039;&amp;#039;MIT Technology Review&amp;#039;&amp;#039;. 21 February 2018.&amp;lt;/ref&amp;gt; The original paper trained GANs on MNIST, the Toronto Face Database, and CIFAR-10, producing recognisable but blurry images. Despite the modest visual quality, the framework was immediately recognised as significant: it allowed implicit density estimation (no explicit likelihood was required) and produced sharp samples, in contrast to the blurred outputs then typical of [[variational autoencoder|variational autoencoders]].&lt;br /&gt;
&lt;br /&gt;
=== Rapid scaling (2015–2018) ===&lt;br /&gt;
The years immediately following saw a cascade of architectural improvements:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;DCGAN&amp;#039;&amp;#039;&amp;#039; (Radford, Metz and Chintala, 2015)&amp;lt;ref&amp;gt;Radford, Alec; Metz, Luke; Chintala, Soumith (2015). &amp;quot;Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks&amp;quot;. arXiv:1511.06434.&amp;lt;/ref&amp;gt; introduced a convolutional architecture with batch normalisation, strided convolutions in the discriminator and fractionally-strided convolutions in the generator, and the absence of fully-connected layers. DCGAN stabilised training enough to produce convincing 64×64 images of bedrooms and faces, and the &amp;quot;DCGAN recipe&amp;quot; became a standard baseline.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Conditional GAN&amp;#039;&amp;#039;&amp;#039; (Mirza and Osindero, 2014)&amp;lt;ref&amp;gt;Mirza, Mehdi; Osindero, Simon (2014). &amp;quot;Conditional Generative Adversarial Nets&amp;quot;. arXiv:1411.1784.&amp;lt;/ref&amp;gt; added a class label or side input to both networks, enabling controllable generation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;pix2pix&amp;#039;&amp;#039;&amp;#039; (Isola &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, 2017)&amp;lt;ref&amp;gt;Isola, Phillip; Zhu, Jun-Yan; Zhou, Tinghui; Efros, Alexei A. (2017). &amp;quot;Image-to-Image Translation with Conditional Adversarial Networks&amp;quot;. &amp;#039;&amp;#039;CVPR&amp;#039;&amp;#039;. arXiv:1611.07004.&amp;lt;/ref&amp;gt; demonstrated that paired data could be used to learn mappings between image domains (sketches to photographs, aerial imagery to maps, semantic segmentations to street scenes).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;CycleGAN&amp;#039;&amp;#039;&amp;#039; (Zhu &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, 2017)&amp;lt;ref&amp;gt;Zhu, Jun-Yan; Park, Taesung; Isola, Phillip; Efros, Alexei A. (2017). &amp;quot;Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks&amp;quot;. &amp;#039;&amp;#039;ICCV&amp;#039;&amp;#039;. arXiv:1703.10593.&amp;lt;/ref&amp;gt; removed the pairing requirement using a cycle-consistency loss, enabling unpaired translation (e.g., horses ↔ zebras, summer ↔ winter photographs, paintings ↔ photographs).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Progressive Growing GAN&amp;#039;&amp;#039;&amp;#039; (Karras &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, NVIDIA, 2017)&amp;lt;ref&amp;gt;Karras, Tero; Aila, Timo; Laine, Samuli; Lehtinen, Jaakko (2017). &amp;quot;Progressive Growing of GANs for Improved Quality, Stability, and Variation&amp;quot;. arXiv:1710.10196.&amp;lt;/ref&amp;gt; trained GANs starting from low-resolution images and progressively added layers, producing the first unambiguously photorealistic 1024×1024 face images from the CelebA-HQ dataset.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Wasserstein GAN&amp;#039;&amp;#039;&amp;#039; (Arjovsky &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, 2017)&amp;lt;ref&amp;gt;Arjovsky, Martin; Chintala, Soumith; Bottou, Léon (2017). &amp;quot;Wasserstein GAN&amp;quot;. arXiv:1701.07875.&amp;lt;/ref&amp;gt; replaced the Jensen–Shannon-divergence-based objective with the earth-mover (Wasserstein-1) distance, producing loss values that correlated with sample quality and greatly reduced training instability.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Spectral normalisation&amp;#039;&amp;#039;&amp;#039; (Miyato &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, 2018)&amp;lt;ref&amp;gt;Miyato, Takeru; Kataoka, Toshiki; Koyama, Masanori; Yoshida, Yuichi (2018). &amp;quot;Spectral Normalization for Generative Adversarial Networks&amp;quot;. &amp;#039;&amp;#039;ICLR&amp;#039;&amp;#039;. arXiv:1802.05957.&amp;lt;/ref&amp;gt; further stabilised training by constraining the Lipschitz constant of the discriminator.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;BigGAN&amp;#039;&amp;#039;&amp;#039; (Brock &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, DeepMind, 2018)&amp;lt;ref&amp;gt;Brock, Andrew; Donahue, Jeff; Simonyan, Karen (2018). &amp;quot;Large Scale GAN Training for High Fidelity Natural Image Synthesis&amp;quot;. arXiv:1809.11096.&amp;lt;/ref&amp;gt; demonstrated that with sufficient model size, batch size (2048), and careful regularisation, class-conditional GANs could produce state-of-the-art 512×512 images on the full ImageNet dataset.&lt;br /&gt;
&lt;br /&gt;
=== StyleGAN and face synthesis (2018–2021) ===&lt;br /&gt;
NVIDIA&amp;#039;s [[StyleGAN]] series (Karras &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, 2018, 2019, 2021) introduced a style-based generator that decoupled high-level attributes (pose, identity) from stochastic details (hair, freckles) through a mapping network and adaptive instance normalisation.&amp;lt;ref&amp;gt;Karras, Tero; Laine, Samuli; Aila, Timo (2018). &amp;quot;A Style-Based Generator Architecture for Generative Adversarial Networks&amp;quot;. &amp;#039;&amp;#039;CVPR 2019&amp;#039;&amp;#039;. arXiv:1812.04948.&amp;lt;/ref&amp;gt; StyleGAN2 (2019) removed visible artefacts attributable to adaptive instance normalisation, and StyleGAN3 (2021) addressed aliasing and &amp;quot;texture sticking&amp;quot; during smooth interpolation. StyleGAN output drove the 2018 website &amp;#039;&amp;#039;thispersondoesnotexist.com&amp;#039;&amp;#039;, which in turn catalysed widespread public awareness of synthetic media. StyleGAN remains, as of 2026, a competitive baseline for high-resolution face generation and is widely used as a backbone for downstream tasks.&lt;br /&gt;
&lt;br /&gt;
=== Displacement by diffusion models (2021–2022) ===&lt;br /&gt;
Although GANs continued to improve throughout the late 2010s, three reliability problems — training instability, &amp;#039;&amp;#039;&amp;#039;mode collapse&amp;#039;&amp;#039;&amp;#039; (see below), and difficulty with text conditioning — became increasingly limiting as the field shifted toward text-to-image generation. Dhariwal and Nichol&amp;#039;s 2021 paper &amp;quot;Diffusion Models Beat GANs on Image Synthesis&amp;quot;&amp;lt;ref&amp;gt;Dhariwal, Prafulla; Nichol, Alex (2021). &amp;quot;Diffusion Models Beat GANs on Image Synthesis&amp;quot;. arXiv:2105.05233.&amp;lt;/ref&amp;gt; demonstrated that class-conditional [[diffusion model|diffusion models]] could match or exceed BigGAN on ImageNet while being substantially easier to train. The subsequent releases of DALL-E 2, Imagen, Midjourney and Stable Diffusion, all built on diffusion rather than adversarial objectives, effectively ended GAN dominance of frontier image synthesis. Later work (notably Kang &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;&amp;#039; 2023 paper &amp;quot;Scaling up GANs for Text-to-Image Synthesis&amp;quot;,&amp;lt;ref&amp;gt;Kang, Minguk; Zhu, Jun-Yan; Zhang, Richard; Park, Jaesik; Shechtman, Eli; Paris, Sylvain; Park, Taesung (2023). &amp;quot;Scaling up GANs for Text-to-Image Synthesis&amp;quot;. &amp;#039;&amp;#039;CVPR&amp;#039;&amp;#039;. arXiv:2303.05511.&amp;lt;/ref&amp;gt; which introduced GigaGAN) showed that GANs can in fact be scaled to text-to-image, but the community&amp;#039;s attention had already moved.&lt;br /&gt;
&lt;br /&gt;
== Mathematical formulation ==&lt;br /&gt;
&lt;br /&gt;
The original (non-saturating) GAN objective is a two-player minimax game with value function&lt;br /&gt;
:&amp;lt;math&amp;gt;\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;G&amp;lt;/math&amp;gt; is the generator mapping a noise vector &amp;lt;math&amp;gt;z&amp;lt;/math&amp;gt; (typically sampled from a standard Gaussian or uniform distribution) to a candidate sample, &amp;lt;math&amp;gt;D&amp;lt;/math&amp;gt; is the discriminator outputting the probability that its input came from the real data distribution &amp;lt;math&amp;gt;p_{\text{data}}&amp;lt;/math&amp;gt; rather than the generator, and &amp;lt;math&amp;gt;p_z&amp;lt;/math&amp;gt; is the prior over latent noise.&lt;br /&gt;
&lt;br /&gt;
For a fixed generator, the optimal discriminator is&lt;br /&gt;
:&amp;lt;math&amp;gt;D^*_G(x) = \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;p_g&amp;lt;/math&amp;gt; is the implicit distribution induced by passing &amp;lt;math&amp;gt;p_z&amp;lt;/math&amp;gt; through &amp;lt;math&amp;gt;G&amp;lt;/math&amp;gt;. Substituting this into the value function and simplifying shows that the generator is minimising the &amp;#039;&amp;#039;&amp;#039;Jensen–Shannon divergence&amp;#039;&amp;#039;&amp;#039; between &amp;lt;math&amp;gt;p_g&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;p_{\text{data}}&amp;lt;/math&amp;gt;, and the global minimum is achieved uniquely when &amp;lt;math&amp;gt;p_g = p_{\text{data}}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Non-saturating loss ===&lt;br /&gt;
In practice, early in training the discriminator rapidly assigns near-zero probability to generator samples, so the generator&amp;#039;s gradient from &amp;lt;math&amp;gt;\log(1 - D(G(z)))&amp;lt;/math&amp;gt; vanishes. The original paper therefore proposed the non-saturating alternative&lt;br /&gt;
:&amp;lt;math&amp;gt;\max_G \mathbb{E}_{z \sim p_z}[\log D(G(z))]&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which has the same fixed point but provides stronger gradients in the early stages.&lt;br /&gt;
&lt;br /&gt;
=== Wasserstein objective ===&lt;br /&gt;
The Wasserstein GAN (WGAN) replaces the Jensen–Shannon divergence with the Wasserstein-1 (earth-mover) distance. Under the Kantorovich–Rubinstein duality this becomes&lt;br /&gt;
:&amp;lt;math&amp;gt;\min_G \max_{\|f\|_L \le 1} \mathbb{E}_{x \sim p_{\text{data}}}[f(x)] - \mathbb{E}_{z \sim p_z}[f(G(z))]&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; (the &amp;quot;critic&amp;quot;) must be 1-Lipschitz. The Lipschitz constraint was originally enforced by weight clipping and later by a gradient penalty (WGAN-GP).&amp;lt;ref&amp;gt;Gulrajani, Ishaan; Ahmed, Faruk; Arjovsky, Martin; Dumoulin, Vincent; Courville, Aaron (2017). &amp;quot;Improved Training of Wasserstein GANs&amp;quot;. arXiv:1704.00028.&amp;lt;/ref&amp;gt; The Wasserstein loss is finite and differentiable even when the supports of &amp;lt;math&amp;gt;p_g&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;p_{\text{data}}&amp;lt;/math&amp;gt; do not overlap, which addresses the gradient-vanishing pathology of the original formulation.&lt;br /&gt;
&lt;br /&gt;
=== Other objectives ===&lt;br /&gt;
Numerous alternative objectives have been proposed, including the least-squares GAN loss,&amp;lt;ref&amp;gt;Mao, Xudong; Li, Qing; Xie, Haoran; Lau, Raymond Y. K.; Wang, Zhen; Smolley, Stephen Paul (2017). &amp;quot;Least Squares Generative Adversarial Networks&amp;quot;. &amp;#039;&amp;#039;ICCV&amp;#039;&amp;#039;. arXiv:1611.04076.&amp;lt;/ref&amp;gt; the hinge loss (used in BigGAN, SAGAN, StyleGAN), the relativistic GAN loss, and f-divergence-based generalisations. Empirically, no single objective dominates across tasks; the choice is usually made in combination with architectural and regularisation decisions.&lt;br /&gt;
&lt;br /&gt;
== Training dynamics and common failure modes ==&lt;br /&gt;
&lt;br /&gt;
GAN training is notoriously finicky relative to supervised learning or likelihood-based generative models. The characteristic pathologies include:&lt;br /&gt;
&lt;br /&gt;
; Mode collapse : The generator learns to produce only a small subset of the target distribution — in extreme cases, a single sample — because that sample happens to fool the current discriminator. Mode collapse is the single most common GAN failure and has motivated many of the architectural and loss-function innovations listed above.&lt;br /&gt;
; Non-convergence : Because the loss surface is a saddle point rather than a minimum, gradient descent is not guaranteed to converge, and in practice training can oscillate indefinitely.&lt;br /&gt;
; Discriminator overpowering : If the discriminator learns too quickly, it assigns arbitrarily low probability to generator samples and the generator&amp;#039;s gradients vanish.&lt;br /&gt;
; Vanishing gradients : Related to the above; the original saturating loss becomes uninformative when the discriminator is confident.&lt;br /&gt;
; Hyperparameter sensitivity : Successful recipes (DCGAN, StyleGAN) emerged after extensive manual tuning, and small changes to learning rate, optimiser, or batch size can destroy convergence.&lt;br /&gt;
&lt;br /&gt;
Stabilisation techniques that have accumulated in the literature include:&lt;br /&gt;
* Two-timescale update rules (TTUR), in which the discriminator is updated with a higher learning rate than the generator.&amp;lt;ref&amp;gt;Heusel, Martin; Ramsauer, Hubert; Unterthiner, Thomas; Nessler, Bernhard; Hochreiter, Sepp (2017). &amp;quot;GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium&amp;quot;. arXiv:1706.08500.&amp;lt;/ref&amp;gt;&lt;br /&gt;
* Spectral normalisation of the discriminator.&lt;br /&gt;
* Gradient penalty regularisation (WGAN-GP, R1/R2 penalties).&lt;br /&gt;
* Feature matching, minibatch discrimination, and unrolled GANs as historical mitigations for mode collapse.&lt;br /&gt;
* Exponential moving averages of generator weights (a technique borrowed from semi-supervised learning that is standard in StyleGAN and BigGAN).&lt;br /&gt;
&lt;br /&gt;
== Evaluation metrics ==&lt;br /&gt;
&lt;br /&gt;
Because GANs do not provide a tractable likelihood, they cannot be evaluated by log-likelihood in the way that autoregressive models or normalising flows can. The dominant metrics are therefore sample-based:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Inception Score (IS)&amp;#039;&amp;#039;&amp;#039; — measures both the clarity and diversity of generated images using a pretrained Inception classifier. Criticised for being gameable and for depending entirely on the pretrained classifier&amp;#039;s training distribution.&amp;lt;ref&amp;gt;Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi (2016). &amp;quot;Improved Techniques for Training GANs&amp;quot;. &amp;#039;&amp;#039;NeurIPS&amp;#039;&amp;#039;. arXiv:1606.03498.&amp;lt;/ref&amp;gt;&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Fréchet Inception Distance (FID)&amp;#039;&amp;#039;&amp;#039; — compares the Gaussian moments of Inception features of generated and real images. Introduced alongside TTUR, it is currently the de-facto standard for image generation evaluation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Precision and recall for generative models&amp;#039;&amp;#039;&amp;#039; — separates fidelity (precision) from coverage (recall), addressing a weakness of FID which conflates the two.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Kernel Inception Distance (KID)&amp;#039;&amp;#039;&amp;#039; — a sample-size-unbiased alternative to FID based on the maximum mean discrepancy.&lt;br /&gt;
&lt;br /&gt;
Human evaluation and task-specific metrics (identity preservation, text–image alignment, downstream classifier accuracy) remain important supplements, especially for applications where FID is known to correlate poorly with perceived quality.&lt;br /&gt;
&lt;br /&gt;
== Applications ==&lt;br /&gt;
&lt;br /&gt;
=== Image synthesis and editing ===&lt;br /&gt;
Face generation (StyleGAN and successors), class-conditional natural-image synthesis (BigGAN), and scene generation on specialised domains (bedrooms, cars, anime) are the canonical image applications. GAN-based latent-space editing — altering hair, age, pose, or expression by manipulating a vector in the generator&amp;#039;s latent space — is the foundation of interactive image-editing products such as those integrated into consumer photo apps.&lt;br /&gt;
&lt;br /&gt;
=== Image-to-image translation ===&lt;br /&gt;
pix2pix, CycleGAN, and their many successors are used for style transfer, map/photo conversion, day/night conversion, colorisation of greyscale images, semantic segmentation, medical imaging domain adaptation, and many other paired or unpaired mapping tasks.&lt;br /&gt;
&lt;br /&gt;
=== Super-resolution ===&lt;br /&gt;
SRGAN (Ledig &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, 2017)&amp;lt;ref&amp;gt;Ledig, Christian &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039; (2017). &amp;quot;Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network&amp;quot;. &amp;#039;&amp;#039;CVPR&amp;#039;&amp;#039;. arXiv:1609.04802.&amp;lt;/ref&amp;gt; and its successors ESRGAN (2018) and Real-ESRGAN (2021)&amp;lt;ref&amp;gt;Wang, Xintao; Xie, Liangbin; Dong, Chao; Shan, Ying (2021). &amp;quot;Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data&amp;quot;. &amp;#039;&amp;#039;ICCV Workshops&amp;#039;&amp;#039;. arXiv:2107.10833.&amp;lt;/ref&amp;gt; produce perceptually convincing high-resolution reconstructions from low-resolution inputs by combining an adversarial loss with a pixel-wise or perceptual loss. GAN-based super-resolution remains widely used in photo restoration, video upscaling, and games (notably NVIDIA&amp;#039;s DLSS family, although these use further proprietary modifications).&lt;br /&gt;
&lt;br /&gt;
=== Medical imaging ===&lt;br /&gt;
GANs are used in medical imaging for modality conversion (e.g., synthesising CT scans from MRI), data augmentation when labelled pathological cases are scarce, and anomaly detection (by training a GAN on healthy-tissue images and flagging regions that the generator cannot reconstruct).&lt;br /&gt;
&lt;br /&gt;
=== Audio and music ===&lt;br /&gt;
WaveGAN, GAN-TTS, and HiFi-GAN apply adversarial training to raw audio waveforms or intermediate representations. HiFi-GAN&amp;lt;ref&amp;gt;Kong, Jungil; Kim, Jaehyeon; Bae, Jaekyoung (2020). &amp;quot;HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis&amp;quot;. &amp;#039;&amp;#039;NeurIPS&amp;#039;&amp;#039;. arXiv:2010.05646.&amp;lt;/ref&amp;gt; in particular became a standard vocoder component in text-to-speech systems for several years, prized for its real-time inference speed.&lt;br /&gt;
&lt;br /&gt;
=== Scientific applications ===&lt;br /&gt;
GANs have been applied to generating synthetic training data for particle-physics experiments (a use case explicitly highlighted in CERN&amp;#039;s computing roadmap), simulating astronomical images, designing novel molecules and proteins (though here diffusion models such as RFdiffusion have displaced GANs), and generating synthetic tabular healthcare data with privacy-preserving guarantees (CTGAN and related methods).&lt;br /&gt;
&lt;br /&gt;
=== Deepfakes ===&lt;br /&gt;
Adversarially-trained face-swap and face-reenactment systems — colloquially &amp;#039;&amp;#039;&amp;#039;[[deepfake|deepfakes]]&amp;#039;&amp;#039;&amp;#039; — are among the most socially visible applications of GANs. The first widely-used open-source deepfake implementation, released on Reddit in 2017, combined a face-detection pipeline with an autoencoder; later systems incorporated adversarial losses for improved realism. Deepfakes have been linked to non-consensual intimate imagery, political disinformation, and fraud, and have driven a substantial literature on deepfake detection (itself frequently based on GANs or diffusion models).&lt;br /&gt;
&lt;br /&gt;
== Notable variants ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Variant !! Year !! Innovation !! Primary contribution&lt;br /&gt;
|-&lt;br /&gt;
| Original GAN || 2014 || Adversarial training || Founding paper&lt;br /&gt;
|-&lt;br /&gt;
| Conditional GAN || 2014 || Class-label conditioning || Controllable generation&lt;br /&gt;
|-&lt;br /&gt;
| DCGAN || 2015 || Convolutional architecture || Stable training recipe&lt;br /&gt;
|-&lt;br /&gt;
| InfoGAN || 2016 || Mutual-information maximisation || Interpretable latents&lt;br /&gt;
|-&lt;br /&gt;
| pix2pix || 2016 || Paired image-to-image || Supervised translation&lt;br /&gt;
|-&lt;br /&gt;
| WGAN || 2017 || Earth-mover distance || Stability&lt;br /&gt;
|-&lt;br /&gt;
| Progressive GAN || 2017 || Growing resolution || First photorealistic 1024² faces&lt;br /&gt;
|-&lt;br /&gt;
| CycleGAN || 2017 || Cycle consistency || Unpaired translation&lt;br /&gt;
|-&lt;br /&gt;
| SAGAN || 2018 || Self-attention layers || Long-range structure&lt;br /&gt;
|-&lt;br /&gt;
| BigGAN || 2018 || Scale, truncation trick || State-of-the-art ImageNet&lt;br /&gt;
|-&lt;br /&gt;
| StyleGAN || 2018 || Style-based generator || High-resolution faces&lt;br /&gt;
|-&lt;br /&gt;
| StyleGAN2 || 2019 || Weight demodulation || Removes blob artefacts&lt;br /&gt;
|-&lt;br /&gt;
| StyleGAN3 || 2021 || Alias-free architecture || Rotation- and translation-equivariant&lt;br /&gt;
|-&lt;br /&gt;
| GigaGAN || 2023 || 1-billion-parameter GAN || Competitive text-to-image&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Relation to other generative models ==&lt;br /&gt;
&lt;br /&gt;
GANs sit within a broader taxonomy of deep generative models:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;[[Variational autoencoder|Variational autoencoders (VAEs)]]&amp;#039;&amp;#039;&amp;#039; optimise a variational lower bound on the log-likelihood and provide an explicit (if approximate) posterior over latents, but traditionally produce blurrier samples than GANs.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Autoregressive models&amp;#039;&amp;#039;&amp;#039; (PixelRNN, PixelCNN, VQ-VAE-2, and on the language side GPT) model the data distribution factorially and provide exact likelihood but are slow to sample from for high-dimensional continuous data.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Normalising flows&amp;#039;&amp;#039;&amp;#039; (RealNVP, Glow, FFJORD) provide exact likelihood and invertible generation at the cost of architectural restrictions.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Energy-based models&amp;#039;&amp;#039;&amp;#039; learn an unnormalised probability density, with sampling typically done by Langevin dynamics or other MCMC methods.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;[[Diffusion model|Diffusion models]]&amp;#039;&amp;#039;&amp;#039; learn to reverse a fixed noising process; they provide tractable likelihood bounds, stable training, and (as of the mid-2020s) state-of-the-art sample quality.&lt;br /&gt;
&lt;br /&gt;
Conceptually, a GAN can be viewed as a special case of the broader framework of &amp;#039;&amp;#039;&amp;#039;likelihood-free inference&amp;#039;&amp;#039;&amp;#039; — methods that compare distributions by samples rather than by density evaluation. The discriminator in a GAN is precisely a density-ratio estimator, and much of the post-2017 theoretical literature has reframed GANs in these terms.&lt;br /&gt;
&lt;br /&gt;
== Hybrid and post-GAN architectures ==&lt;br /&gt;
&lt;br /&gt;
Even as diffusion models displaced pure GANs at the frontier, adversarial losses have remained valuable as auxiliary training signals in many hybrid systems:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;VQ-GAN&amp;#039;&amp;#039;&amp;#039; (Esser &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;, 2021)&amp;lt;ref&amp;gt;Esser, Patrick; Rombach, Robin; Ommer, Björn (2021). &amp;quot;Taming Transformers for High-Resolution Image Synthesis&amp;quot;. &amp;#039;&amp;#039;CVPR&amp;#039;&amp;#039;. arXiv:2012.09841.&amp;lt;/ref&amp;gt; combines a vector-quantised autoencoder with an adversarial and perceptual loss on the decoder, producing a compressed latent representation used as the input to a transformer or (in Stable Diffusion and related systems) a diffusion model. The adversarial decoder is one reason modern latent diffusion models produce sharp reconstructions.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Consistency models&amp;#039;&amp;#039;&amp;#039; and &amp;#039;&amp;#039;&amp;#039;distilled diffusion&amp;#039;&amp;#039;&amp;#039; sometimes incorporate adversarial objectives to compress a many-step sampler into a one- or few-step generator.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Neural radiance field (NeRF)&amp;#039;&amp;#039;&amp;#039; editing and 3D-aware generation systems such as EG3D use adversarial training on rendered views.&lt;br /&gt;
&lt;br /&gt;
== Criticism and limitations ==&lt;br /&gt;
&lt;br /&gt;
Beyond the training-dynamics issues listed above, GANs have attracted specific criticisms:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;No likelihood&amp;#039;&amp;#039;&amp;#039; — GANs do not expose a density and cannot be meaningfully compared with likelihood-based models on measures such as test-set log-likelihood. They also cannot straightforwardly score or rank candidate samples in the way that autoregressive or diffusion models can.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Mode dropping&amp;#039;&amp;#039;&amp;#039; — Even when not fully collapsed, GANs frequently under-represent minority modes, an effect that can encode or amplify dataset biases.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Memorisation&amp;#039;&amp;#039;&amp;#039; — Large GANs have been shown to memorise individual training examples, raising copyright and privacy concerns. (This is now understood to be a property shared by essentially all large generative models.)&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Evaluation ambiguity&amp;#039;&amp;#039;&amp;#039; — FID and IS correlate only loosely with human judgements, and can be gamed by models that produce visually unrealistic images in ways the metric does not penalise.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Brittleness to text conditioning&amp;#039;&amp;#039;&amp;#039; — pure-GAN text-to-image systems were consistently outperformed by diffusion models on open-vocabulary prompts, a shortcoming that took until GigaGAN (2023) to be meaningfully addressed.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
* [[Diffusion model]]&lt;br /&gt;
* [[Variational autoencoder]]&lt;br /&gt;
* [[Deep learning]]&lt;br /&gt;
* [[Artificial neural network]]&lt;br /&gt;
* [[Convolutional neural network]]&lt;br /&gt;
* [[StyleGAN]]&lt;br /&gt;
* [[Deepfake]]&lt;br /&gt;
* [[Generative artificial intelligence]]&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&amp;lt;references /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Deep learning]]&lt;br /&gt;
[[Category:Generative models]]&lt;br /&gt;
[[Category:Machine learning]]&lt;br /&gt;
[[Category:Neural networks]]&lt;/div&gt;</summary>
		<author><name>ScottBot</name></author>
	</entry>
</feed>