Few-shot Image Generation with Mixup-based Distance Learning
ECCV 2022

Abstract

overview

Producing diverse and realistic images with generative models such as GANs typically requires large scale training with vast amount of images. GANs trained with limited data can easily memorize few training samples and display undesirable properties like stairlike latent space where interpolation in the latent space yields discontinuous transitions in the output space. In this work, we consider a challenging task of pretraining-free few-shot image synthesis, and seek to train existing generative models with minimal overfitting and mode collapse. We propose mixup-based distance regularization on the feature space of both a generator and the counterpart discriminator that encourages the two players to reason not only about the scarce observed data points but the relative distances in the feature space they reside. Qualitative and quantitative evaluation on diverse datasets demonstrates that our method is generally applicable to existing models to enhance both fidelity and diversity under few-shot setting.

Stairlike Latent Space and Mixup-based Distance Learning (MixDL)


overview

Generative models trained with extremely few data samples (e.g., n=10) strongly overfits, being able to produce only a limited set of seen data points. This results in stairlike latent space geometry, under which gradual transition in the latent space results in discontinuous changes in the output (image) space. Thus, obtaining sample diversity is key to few-shot generative modeling while fidelity is a relatively low hanging fruit.

In this work, instead of directly tackling the seemingly insurmountable problem of memorization, we set a surrogate objective of smoothing the generative latent space to enable sampling of diverse perceptually in-between samples. To that end, we intentionally generate an anchor sample by applying mixup in the latent space, and explicitly enforce the pairwise similarity distributions between the anchor sample and normal samples to follow the mixup coefficient. Intuitively, if we have four normal samples and mixup coefficient of (0.4, 0.3, 0.2, 0.1), the normalized pairwise similarities between the mixup sample and the rest should also roughly be (0.4, 0.3, 0.2, 0.1). We formulate this with intermediate feature cosine-similarities and Kullback-Leibler Divergence. Consult to the main paper for more details.

Quantitative Results

We compare our method with various competitive baselines on diverse benchmarks. We mainly use Frechet Inception Distance (FID), LPIPS, sFID, Precision and Recall for the evaluation metric.


overview
overview
overview

Qualitative Results

We present generated samples form ours and the baselines. As opposed to baselines that strictly generate memorized samples, MixDL enables sampling of novel unseen images.


overview overview