Instance Normalization: The Missing Ingredient for Fast Stylization

Andrea Vedaldi; Dmitry Ulyanov; Victor Lempitsky

arxiv: 1607.08022 · v3 · pith:CVLPCOVQnew · submitted 2016-07-27 · 💻 cs.CV

Instance Normalization: The Missing Ingredient for Fast Stylization

Dmitry Ulyanov , Andrea Vedaldi , Victor Lempitsky This is my paper

classification 💻 cs.CV

keywords normalizationstylizationchangefastgithubinstancemethodapply

0 comments

read the original abstract

It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code will is made available on github at https://github.com/DmitryUlyanov/texture_nets. Full paper can be found at arXiv:1701.02096.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 54 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GeoMix: Descriptor-Free Visual Localization via Global Context and Multi-Detector Training
cs.CV 2026-07 unverdicted novelty 7.0

GeoMix achieves new state-of-the-art results in descriptor-free 2D-3D matching by adding directional embeddings, learnable global context nodes, and multi-detector training, cutting rotation and translation errors by ...
SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver
cs.AI 2026-05 unverdicted novelty 7.0

SPACE framework unifies symmetric and asymmetric VRPs via bidirectional Frechet representations and weight-decomposed decoding for zero-shot generalization across 110 variants.
Riemannian Networks over Full-Rank Correlation Matrices
cs.LG 2026-05 unverdicted novelty 7.0

Riemannian networks are introduced for the full-rank correlation matrix manifold by extending MLR, FC, and convolutional layers to five geometries with backpropagation methods for two, showing effectiveness over SPD a...
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
cs.LG 2026-05 unverdicted novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
cs.LG 2026-05 conditional novelty 7.0

A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
QuadNorm: Resolution-Robust Normalization for Neural Operators
cs.LG 2026-05 unverdicted novelty 7.0

QuadNorm uses quadrature-based moments instead of uniform averaging in normalization layers, achieving O(h²) consistency across resolutions and better cross-resolution transfer in neural operators.
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity
stat.ML 2026-05 unverdicted novelty 7.0

Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
cs.CV 2026-05 unverdicted novelty 7.0

A parameter-free input-output wrapper exactly parameterizes all normalization-equivariant functions on arbitrary backbones and improves blind denoising robustness to noise mismatch with zero GPU overhead.
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
cs.CV 2026-05 unverdicted novelty 7.0

A normalize-process-denormalize wrapper enforces normalization equivariance on arbitrary backbones, improving robustness to distribution shift in image denoising with no overhead.
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
cs.CV 2026-05 unverdicted novelty 7.0

Any normalization-equivariant function factors exactly as normalize-arbitrary-backbone-denormalize, enabling efficient equivariance for standard CNNs and transformers in blind image denoising.
StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
cs.GR 2026-04 unverdicted novelty 7.0

StyleID supplies human-perception-aligned benchmarks and fine-tuned encoders that improve facial identity recognition robustness across stylization types and strengths.
High-Speed Full-Color HDR Imaging via Unwrapping Modulo-Encoded Spike Streams
cs.CV 2026-04 unverdicted novelty 7.0

An exposure-decoupled modulo formulation and iteration-free diffusion-prior unwrapping enable 1000 FPS full-color HDR imaging on spike cameras while cutting bandwidth from 20 Gbps to 6 Gbps.
Deep Time Series Models: A Comprehensive Survey and Benchmark
cs.LG 2024-07 unverdicted novelty 7.0

This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET
eess.IV 2024-06 unverdicted novelty 7.0

Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, ...
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
eess.IV 2024-01 unverdicted novelty 7.0

U-Mamba is a hybrid CNN-SSM architecture that outperforms prior CNN and Transformer networks on biomedical image segmentation tasks by efficiently modeling long-range dependencies.
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
cs.LG 2022-11 conditional novelty 7.0

PatchTST uses subseries patching and channel-independent Transformers to deliver significantly better long-term multivariate time series forecasting and strong self-supervised transfer performance.
Switchable Normalization for Learning-to-Normalize Deep Representation
cs.CV 2019-07 unverdicted novelty 7.0

Switchable Normalization learns per-layer weights to combine channel, layer, and minibatch normalizers, claiming robustness to batch size and better results than fixed normalizers on ImageNet, COCO, CityScapes, ADE20K...
Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery using Deep CNNs and Active Learning
cs.CV 2019-07 unverdicted novelty 7.0

Transfer Sampling with Optimal Transport and window cropping finds nearly 80% of animals in new UAV datasets using under 0.5% of labels.
DiffusionBench: On Holistic Evaluation of Diffusion Transformers
cs.CV 2026-06 conditional novelty 6.0

NanoGen unifies DiT training on ImageNet and T2I, reveals negative Pearson correlations (-0.377 to -0.580) in method rankings across metrics from 21 models, and motivates DiffusionBench for holistic evaluation.
Geometry-Aware Style Transfer in 3D Gaussian Splatting
cs.CV 2026-06 unverdicted novelty 6.0

A decoupled optimization framework with geometry-aware contrastive feature matching transfers both appearance and structure in 3D Gaussian splatting scenes.
Forward-Only Convolutional Neural Networks with Learnable Channel-Class Assignment
cs.LG 2026-06 unverdicted novelty 6.0

Learnable channel-class assignment and adaptive layer weighting allow forward-only CNNs to reach new state-of-the-art results among FF models on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
DOME: Learning Transferable Domain Variables from Sparse Supervision for Test-Time Adaptation
cs.CV 2026-06 unverdicted novelty 6.0

DOME learns sample-specific domain variables from sparse supervision via vision-language models and a sparse domain bank to improve test-time adaptation performance.
Samudra 2: Scaling Ocean Emulators across Resolutions
cs.CE 2026-05 unverdicted novelty 6.0

Samudra 2 scales autoregressive neural ocean emulators to finer resolutions with architectural tweaks and dynamic loss, raising upper-ocean temperature R² from 0.56 to 0.87 at 1° and recovering mesoscale features.
WLNO: Wavelet-Laplace Neural Operator for Solving Partial Differential Equations
cs.LG 2026-05 unverdicted novelty 6.0

WLNO augments LNO with a parallel Haar wavelet branch and learnable gate to capture multi-scale spatial features, outperforming LNO on five PDE benchmarks especially those with sharp structures.
Representation-Guided Discrete Molecular Graph Retrosynthesis
cs.LG 2026-05 unverdicted novelty 6.0

GRG achieves 58.6/77.2/83.4/87.1 top-1/3/5/10 accuracy and 15.5 diversity on USPTO-50k retrosynthesis, outperforming the base generator while reducing training time by 30%.
Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver
cs.AI 2026-05 unverdicted novelty 6.0

The CARM module boosts neural routing solvers by adaptively modulating embeddings with constraint variables, enabling better use of global observations and improved performance on constrained VRPs.
Linearizing Vision Transformer with Test-Time Training
cs.CV 2026-05 unverdicted novelty 6.0

Using Test-Time Training's structural match to Softmax attention plus key normalization and locality modules allows inheriting pretrained weights and fine-tuning Stable Diffusion 3.5 in one hour to match quality while...
Linearizing Vision Transformer with Test-Time Training
cs.CV 2026-05 unverdicted novelty 6.0

Converts pretrained Vision Transformers to linear-complexity TTT models via architectural and representational alignment, demonstrated by linearizing Stable Diffusion 3.5 with 1-hour fine-tuning to match quality at 1....
Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?
eess.IV 2026-04 unverdicted novelty 6.0

Natural-domain foundation models provide competitive and more robust priors than task-specific models for accelerated cardiac MRI reconstruction in cross-domain settings.
A Fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation
physics.med-ph 2026-04 unverdicted novelty 6.0

A hybrid Transformer-UNet model with energy-shifting inputs generates 6 MV LINAC dose maps from monoenergetic data, achieving over 98% gamma passing rate (3%/3mm) versus full Monte Carlo for prostate radiotherapy.
Time-Domain Voice Identity Morphing (TD-VIM): A Signal-Level Approach to Morphing Attacks on Speaker Verification Systems
cs.SD 2026-04 unverdicted novelty 6.0

TD-VIM creates signal-level morphed voice samples that achieve G-MAP attack success rates up to 99.74% against deep-learning and commercial speaker verification systems.
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
cs.LG 2026-03 unverdicted novelty 6.0

GCGNet uses a variational generator, graph structure aligner, and graph refiner to jointly capture temporal and channel correlations in time series forecasting with exogenous variables, outperforming baselines on 12 r...
Learning to accelerate distributed ADMM using graph neural networks
cs.LG 2025-09 conditional novelty 6.0

A GNN is trained to predict adaptive step sizes and weights for distributed ADMM by unrolling a fixed number of iterations and minimizing solution error on a problem class.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
cs.LG 2021-04 accept novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Order Matters: Shuffling Sequence Generation for Video Prediction
cs.CV 2019-07 unverdicted novelty 6.0

SEE-Net improves video prediction by using frame shuffling to enforce learning of natural temporal order, reporting state-of-the-art results on three synthetic and real-world datasets.
Generative Modeling by Estimating Gradients of the Data Distribution
cs.LG 2019-07 unverdicted novelty 6.0

Score-based generative modeling via multi-noise-level score matching and annealed Langevin dynamics produces samples on par with GANs and sets a new inception score record on CIFAR-10.
A Convolutional Decoder for Point Clouds using Adaptive Instance Normalization
cs.CV 2019-06 unverdicted novelty 6.0

A point cloud decoder using Adaptive Instance Normalization outperforms prior methods in auto-encoding, upsampling, and single-view reconstruction tasks.
StateFlow: Dual-State Recurrent Modeling for Long-Horizon Time Series Forecasting
cs.LG 2026-06 unverdicted novelty 5.0

StateFlow extends VARNN with dual hidden and residual-memory states plus a chunk decoder and two-stage training to enable competitive long-horizon time series forecasting while retaining a compact recurrent design.
PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
cs.LG 2026-06 unverdicted novelty 5.0

A polynomial preconditioning layer controls singular value spectra of transformer weights to stabilize pre-training, shown effective on Llama-1B and supported by convergence theory for deep linear networks.
Rank-Aware Quantile Activation for Motion-Robust Crop Segmentation in UAV Imagery
cs.CV 2026-05 unverdicted novelty 5.0

QAct uses instance-level rank normalization instead of magnitude gating to deliver consistent mIoU gains over ReLU on rare texture-dependent classes in Agriculture-Vision 2021 under zero-shot and blur-supervised regimes.
Mitigating Content Shift and Hallucination in GenAI Image Editing via Structural Refinement
cs.CV 2026-05 unverdicted novelty 5.0

Introduces a structure-preserving GenAI fusion framework that fuses input images with GenAI outputs via coarse correspondences to transfer enhancements while suppressing hallucinations.
SegGuidedNet: Sub-Region-Aware Attention Supervision for Interpretable Brain Tumor Segmentation
cs.CV 2026-05 unverdicted novelty 5.0

SegGuidedNet achieves Dice scores of 0.905 on BraTS2021 and 0.897 on BraTS2023 with sub-region attention supervision that adds under 0.2% parameters and provides free spatial interpretability.
USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation
cs.CV 2026-05 unverdicted novelty 5.0

USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medi...
Style-Based Neural Architectures for Real-Time Weather Classification
cs.CV 2026-04 unverdicted novelty 5.0

Three style-based neural architectures are proposed for real-time weather classification from images, with two truncated ResNet variants claimed to outperform prior methods and generalize across public datasets.
Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift
cs.LG 2026-04 unverdicted novelty 5.0

Reversible Residual Normalization (RRN) introduces spatially-aware invertible residual blocks that combine center normalization with spectral-constrained graph convolutions to mitigate spatio-temporal distribution shi...
TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
cs.LG 2025-11 unverdicted novelty 5.0

TimePre unifies MLP speed and MCL distributional power via Stabilized Instance Normalization to deliver SOTA probabilistic accuracy, orders-of-magnitude faster inference, and improved stability over prior MCL methods.
Annotation-Free Cardiac Vessel Segmentation via Knowledge Transfer from Retinal Images
eess.IV 2019-07 unverdicted novelty 5.0

SC-GAN performs annotation-free coronary artery segmentation by transferring shape-consistent knowledge from retinal vessel annotations via a GAN trained on 1092 DSA images.
High-throughput Onboard Hyperspectral Image Compression with Ground-based CNN Reconstruction
eess.IV 2019-07 unverdicted novelty 5.0

Prequantization-based lossless predictive compression onboard hyperspectral images with CNN ground reconstruction recovers the entire SNR drop at 2 bpp.
Disentangled Makeup Transfer with Generative Adversarial Network
cs.CV 2019-07 unverdicted novelty 5.0

DMT uses identity and makeup encoders in a GAN to enable controllable makeup transfer from references and sampling of new styles from a prior distribution.
Learning Adversarial Augmentation Policies for Robust Garlic Seedling Detection
cs.CV 2026-06 unverdicted novelty 4.0

A new outdoor garlic seedling dataset and adversarial augmentation policy learning improve detection AP50 to 91.6% and missing-seedling F1 to 67% under variable illumination.
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
cs.LG 2026-04 unverdicted novelty 4.0

A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
Adapted Center and Scale Prediction: More Stable and More Accurate
cs.CV 2020-02 unverdicted novelty 4.0

Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
Mean Spectral Normalization of Deep Neural Networks for Embedded Automation
cs.LG 2019-07 unverdicted novelty 4.0

Proposes MSN reparameterization to address mean-drift in SN, claiming ~16% faster inference than BN with fewer parameters on CNNs and GANs.
Fast Universal Style Transfer for Artistic and Photorealistic Rendering
cs.CV 2019-07 unverdicted novelty 4.0

ArtNet and PhotoNet enable one-pass fast universal style transfer with fewer artifacts, better detail preservation, and 3-100x speedup over prior AE-based methods.