Instance Normalization: The Missing Ingredient for Fast Stylization
read the original abstract
It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code will is made available on github at https://github.com/DmitryUlyanov/texture_nets. Full paper can be found at arXiv:1701.02096.
This paper has not been read by Pith yet.
Forward citations
Cited by 54 Pith papers
-
GeoMix: Descriptor-Free Visual Localization via Global Context and Multi-Detector Training
GeoMix achieves new state-of-the-art results in descriptor-free 2D-3D matching by adding directional embeddings, learnable global context nodes, and multi-detector training, cutting rotation and translation errors by ...
-
SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver
SPACE framework unifies symmetric and asymmetric VRPs via bidirectional Frechet representations and weight-decomposed decoding for zero-shot generalization across 110 variants.
-
Riemannian Networks over Full-Rank Correlation Matrices
Riemannian networks are introduced for the full-rank correlation matrix manifold by extending MLR, FC, and convolutional layers to five geometries with backpropagation methods for two, showing effectiveness over SPD a...
-
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
-
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
-
QuadNorm: Resolution-Robust Normalization for Neural Operators
QuadNorm uses quadrature-based moments instead of uniform averaging in normalization layers, achieving O(h²) consistency across resolutions and better cross-resolution transfer in neural operators.
-
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity
Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.
-
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
A parameter-free input-output wrapper exactly parameterizes all normalization-equivariant functions on arbitrary backbones and improves blind denoising robustness to noise mismatch with zero GPU overhead.
-
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
A normalize-process-denormalize wrapper enforces normalization equivariance on arbitrary backbones, improving robustness to distribution shift in image denoising with no overhead.
-
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
Any normalization-equivariant function factors exactly as normalize-arbitrary-backbone-denormalize, enabling efficient equivariance for standard CNNs and transformers in blind image denoising.
-
StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
StyleID supplies human-perception-aligned benchmarks and fine-tuned encoders that improve facial identity recognition robustness across stylization types and strengths.
-
High-Speed Full-Color HDR Imaging via Unwrapping Modulo-Encoded Spike Streams
An exposure-decoupled modulo formulation and iteration-free diffusion-prior unwrapping enable 1000 FPS full-color HDR imaging on spike cameras while cutting bandwidth from 20 Gbps to 6 Gbps.
-
Deep Time Series Models: A Comprehensive Survey and Benchmark
This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
-
Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, ...
-
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
U-Mamba is a hybrid CNN-SSM architecture that outperforms prior CNN and Transformer networks on biomedical image segmentation tasks by efficiently modeling long-range dependencies.
-
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
PatchTST uses subseries patching and channel-independent Transformers to deliver significantly better long-term multivariate time series forecasting and strong self-supervised transfer performance.
-
Switchable Normalization for Learning-to-Normalize Deep Representation
Switchable Normalization learns per-layer weights to combine channel, layer, and minibatch normalizers, claiming robustness to batch size and better results than fixed normalizers on ImageNet, COCO, CityScapes, ADE20K...
-
Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery using Deep CNNs and Active Learning
Transfer Sampling with Optimal Transport and window cropping finds nearly 80% of animals in new UAV datasets using under 0.5% of labels.
-
DiffusionBench: On Holistic Evaluation of Diffusion Transformers
NanoGen unifies DiT training on ImageNet and T2I, reveals negative Pearson correlations (-0.377 to -0.580) in method rankings across metrics from 21 models, and motivates DiffusionBench for holistic evaluation.
-
Geometry-Aware Style Transfer in 3D Gaussian Splatting
A decoupled optimization framework with geometry-aware contrastive feature matching transfers both appearance and structure in 3D Gaussian splatting scenes.
-
Forward-Only Convolutional Neural Networks with Learnable Channel-Class Assignment
Learnable channel-class assignment and adaptive layer weighting allow forward-only CNNs to reach new state-of-the-art results among FF models on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
-
DOME: Learning Transferable Domain Variables from Sparse Supervision for Test-Time Adaptation
DOME learns sample-specific domain variables from sparse supervision via vision-language models and a sparse domain bank to improve test-time adaptation performance.
-
Samudra 2: Scaling Ocean Emulators across Resolutions
Samudra 2 scales autoregressive neural ocean emulators to finer resolutions with architectural tweaks and dynamic loss, raising upper-ocean temperature R² from 0.56 to 0.87 at 1° and recovering mesoscale features.
-
WLNO: Wavelet-Laplace Neural Operator for Solving Partial Differential Equations
WLNO augments LNO with a parallel Haar wavelet branch and learnable gate to capture multi-scale spatial features, outperforming LNO on five PDE benchmarks especially those with sharp structures.
-
Representation-Guided Discrete Molecular Graph Retrosynthesis
GRG achieves 58.6/77.2/83.4/87.1 top-1/3/5/10 accuracy and 15.5 diversity on USPTO-50k retrosynthesis, outperforming the base generator while reducing training time by 30%.
-
Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver
The CARM module boosts neural routing solvers by adaptively modulating embeddings with constraint variables, enabling better use of global observations and improved performance on constrained VRPs.
-
Linearizing Vision Transformer with Test-Time Training
Using Test-Time Training's structural match to Softmax attention plus key normalization and locality modules allows inheriting pretrained weights and fine-tuning Stable Diffusion 3.5 in one hour to match quality while...
-
Linearizing Vision Transformer with Test-Time Training
Converts pretrained Vision Transformers to linear-complexity TTT models via architectural and representational alignment, demonstrated by linearizing Stable Diffusion 3.5 with 1-hour fine-tuning to match quality at 1....
-
Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?
Natural-domain foundation models provide competitive and more robust priors than task-specific models for accelerated cardiac MRI reconstruction in cross-domain settings.
-
A Fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation
A hybrid Transformer-UNet model with energy-shifting inputs generates 6 MV LINAC dose maps from monoenergetic data, achieving over 98% gamma passing rate (3%/3mm) versus full Monte Carlo for prostate radiotherapy.
-
Time-Domain Voice Identity Morphing (TD-VIM): A Signal-Level Approach to Morphing Attacks on Speaker Verification Systems
TD-VIM creates signal-level morphed voice samples that achieve G-MAP attack success rates up to 99.74% against deep-learning and commercial speaker verification systems.
-
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
GCGNet uses a variational generator, graph structure aligner, and graph refiner to jointly capture temporal and channel correlations in time series forecasting with exogenous variables, outperforming baselines on 12 r...
-
Learning to accelerate distributed ADMM using graph neural networks
A GNN is trained to predict adaptive step sizes and weights for distributed ADMM by unrolling a fixed number of iterations and minimizing solution error on a problem class.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
-
Order Matters: Shuffling Sequence Generation for Video Prediction
SEE-Net improves video prediction by using frame shuffling to enforce learning of natural temporal order, reporting state-of-the-art results on three synthetic and real-world datasets.
-
Generative Modeling by Estimating Gradients of the Data Distribution
Score-based generative modeling via multi-noise-level score matching and annealed Langevin dynamics produces samples on par with GANs and sets a new inception score record on CIFAR-10.
-
A Convolutional Decoder for Point Clouds using Adaptive Instance Normalization
A point cloud decoder using Adaptive Instance Normalization outperforms prior methods in auto-encoding, upsampling, and single-view reconstruction tasks.
-
StateFlow: Dual-State Recurrent Modeling for Long-Horizon Time Series Forecasting
StateFlow extends VARNN with dual hidden and residual-memory states plus a chunk decoder and two-stage training to enable competitive long-horizon time series forecasting while retaining a compact recurrent design.
-
PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
A polynomial preconditioning layer controls singular value spectra of transformer weights to stabilize pre-training, shown effective on Llama-1B and supported by convergence theory for deep linear networks.
-
Rank-Aware Quantile Activation for Motion-Robust Crop Segmentation in UAV Imagery
QAct uses instance-level rank normalization instead of magnitude gating to deliver consistent mIoU gains over ReLU on rare texture-dependent classes in Agriculture-Vision 2021 under zero-shot and blur-supervised regimes.
-
Mitigating Content Shift and Hallucination in GenAI Image Editing via Structural Refinement
Introduces a structure-preserving GenAI fusion framework that fuses input images with GenAI outputs via coarse correspondences to transfer enhancements while suppressing hallucinations.
-
SegGuidedNet: Sub-Region-Aware Attention Supervision for Interpretable Brain Tumor Segmentation
SegGuidedNet achieves Dice scores of 0.905 on BraTS2021 and 0.897 on BraTS2023 with sub-region attention supervision that adds under 0.2% parameters and provides free spatial interpretability.
-
USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation
USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medi...
-
Style-Based Neural Architectures for Real-Time Weather Classification
Three style-based neural architectures are proposed for real-time weather classification from images, with two truncated ResNet variants claimed to outperform prior methods and generalize across public datasets.
-
Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift
Reversible Residual Normalization (RRN) introduces spatially-aware invertible residual blocks that combine center normalization with spectral-constrained graph convolutions to mitigate spatio-temporal distribution shi...
-
TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
TimePre unifies MLP speed and MCL distributional power via Stabilized Instance Normalization to deliver SOTA probabilistic accuracy, orders-of-magnitude faster inference, and improved stability over prior MCL methods.
-
Annotation-Free Cardiac Vessel Segmentation via Knowledge Transfer from Retinal Images
SC-GAN performs annotation-free coronary artery segmentation by transferring shape-consistent knowledge from retinal vessel annotations via a GAN trained on 1092 DSA images.
-
High-throughput Onboard Hyperspectral Image Compression with Ground-based CNN Reconstruction
Prequantization-based lossless predictive compression onboard hyperspectral images with CNN ground reconstruction recovers the entire SNR drop at 2 bpp.
-
Disentangled Makeup Transfer with Generative Adversarial Network
DMT uses identity and makeup encoders in a GAN to enable controllable makeup transfer from references and sampling of new styles from a prior distribution.
-
Learning Adversarial Augmentation Policies for Robust Garlic Seedling Detection
A new outdoor garlic seedling dataset and adversarial augmentation policy learning improve detection AP50 to 91.6% and missing-seedling F1 to 67% under variable illumination.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
Adapted Center and Scale Prediction: More Stable and More Accurate
Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
-
Mean Spectral Normalization of Deep Neural Networks for Embedded Automation
Proposes MSN reparameterization to address mean-drift in SN, claiming ~16% faster inference than BN with fewer parameters on CNNs and GANs.
-
Fast Universal Style Transfer for Artistic and Photorealistic Rendering
ArtNet and PhotoNet enable one-pass fast universal style transfer with fewer artifacts, better detail preservation, and 3-100x speedup over prior AE-based methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.