hub Mixed citations

Multi-Scale Context Aggregation by Dilated Convolutions

· 2015 · cs.CV · arXiv 1511.07122

Mixed citation behavior. Most common role is background (67%).

38 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 38 citing papers arXiv PDF

abstract

State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image classification are structurally different. In this work, we develop a new convolutional network module that is specifically designed for dense prediction. The presented module uses dilated convolutions to systematically aggregate multi-scale contextual information without losing resolution. The architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage. We show that the presented context module increases the accuracy of state-of-the-art semantic segmentation systems. In addition, we examine the adaptation of image classification networks to dense prediction and show that simplifying the adapted network can increase accuracy.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 3

citation-polarity summary

background 4 use method 2

representative citing papers

WaveNet: A Generative Model for Raw Audio

cs.SD · 2016-09-12 · accept · novelty 9.0

WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

Sampling the Schwinger Model with Gauge-Equivariant Diffusion

hep-lat · 2026-06-25 · unverdicted · novelty 7.0

A gauge-equivariant diffusion model samples Schwinger model configurations, yielding unbiased observables matching MCMC and qualitatively less topological freezing than HMC.

WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

WD-FQDet decouples modality-shared and modality-specific features in infrared-visible images via wavelet-based frequency decomposition and frequency-aware query selection to achieve state-of-the-art detection performance.

Graph-based Semantic Calibration Network for Unaligned UAV RGBT Image Semantic Segmentation and A Large-scale Benchmark

cs.CV · 2026-04-29 · unverdicted · novelty 7.0

GSCNet with FDAM and SGCM modules plus the URTF benchmark improves fine-grained semantic segmentation on unaligned UAV RGBT images.

KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.

Cross-Stage Attention Propagation for Efficient Semantic Segmentation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

CSAP computes attention at the deepest scale and propagates the maps to shallower stages, bypassing per-scale query-key computations to cut decoder FLOPs while preserving multi-scale performance and beating SegNeXt-Tiny on ADE20K, Cityscapes, and COCO-Stuff.

Spatio-Temporal Retrieval-based Priors for Adaptive Computational Teaching in Driving

cs.RO · 2026-06-23 · unverdicted · novelty 6.0

An encoder-decoder imitation learning model with nearest-neighbor retrieval and cross-attention priors shows consistent gains over non-adaptive and other adaptive baselines on a semi-synthetic Waymo-based dataset and a small real simulator coaching dataset.

From Spatial to Spectral: An Efficient, Frequency-Guided Feature Representation Learner for Small Object Detection

cs.CV · 2026-06-22 · unverdicted · novelty 6.0

Proposes DERNet with Decompose-Enhance-Reconstruct operator and three plug-and-play modules to shift small object detection from spatial to spectral feature processing, claiming better performance than YOLOv11 with 1/6 the parameters.

From Coarse to Fine: Managing Temporal Granularity in Spatio-Temporal Data for Fine-Grained Traffic Prediction

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

STRP is a granularity-aware model that predicts fine-grained spatio-temporal traffic from coarse inputs via tree convolution and inverse dilated convolution, outperforming baselines on six datasets in window-based and duration-based settings.

IV-Net: A neural network for elliptic PDEs with random and highly varying coefficients

math.NA · 2026-05-24 · unverdicted · novelty 6.0

IV-Net is a multigrid-inspired convolutional neural operator that approximates solutions to linear elliptic PDEs with high-contrast coefficients and shows better accuracy than POD and other neural operators on heterogeneous coercive problems.

Single Level Feature-to-Feature Forecasting with Deformable Convolutions

cs.CV · 2019-07-26 · unverdicted · novelty 6.0

Single-level feature-to-feature forecasting with deformable convolutions on coarse abstract features from a segmentation backbone achieves state-of-the-art results for nine-timestep future semantic segmentation on Cityscapes validation.

VRSTC: Occlusion-Free Video Person Re-Identification

cs.CV · 2019-07-19 · unverdicted · novelty 6.0

STCnet recovers occluded parts in video person re-ID using spatio-temporal cues to form the VRSTC framework, outperforming prior methods on three datasets.

ASCNet: Adaptive-Scale Convolutional Neural Networks for Multi-Scale Feature Learning

cs.CV · 2019-07-07 · unverdicted · novelty 6.0

ASCNet learns per-pixel adaptive dilation rates via a 3-layer convolution structure to produce scale-appropriate receptive fields, yielding higher segmentation accuracy than fixed dilated CNNs on two medical image datasets.

Jointly Aligning and Predicting Continuous Emotion Annotations

cs.LG · 2019-07-05 · unverdicted · novelty 6.0

A multi-delay sinc network jointly aligns speech signals with delayed continuous emotion labels and predicts arousal/valence, claiming state-of-the-art speech-only results on RECOLA and SEWA.

Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering

cs.IR · 2019-06-26 · unverdicted · novelty 6.0

ConvNCF improves neural collaborative filtering by explicitly modeling pairwise correlations via outer product and high-order correlations via CNN on user-item embeddings.

Rethink the Role of Neural Decoders in Quantum Error Correction

quant-ph · 2026-05-12 · unverdicted · novelty 6.0

Neural decoders for surface-code QEC achieve practical microsecond FPGA latency when trained on large datasets with appropriate inductive biases and INT4 quantization, rather than relying on architectural complexity.

YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more training domains are added.

DSFNet: Learning Dual-Domain Spectral Operators for Multi-Modality Spatio-Temporal Forecasting in Urban Transportation Systems

cs.LG · 2026-06-05 · unverdicted · novelty 5.0

DSFNet introduces dual-domain spectral operators to explicitly model cross-modality couplings in multi-modality spatio-temporal traffic forecasting and reports 3.21%-10.16% MAE reductions versus baselines on five real-world datasets.

Cross Attention Network for Semantic Segmentation

cs.CV · 2019-07-25 · unverdicted · novelty 5.0

Cross Attention Network fuses spatial and contextual features via a cross attention module to improve semantic segmentation performance and speed on Cityscapes and CamVid.

Attentive CT Lesion Detection Using Deep Pyramid Inference with Multi-Scale Booster

cs.CV · 2019-07-09 · unverdicted · novelty 5.0

A Multi-Scale Booster with hierarchical dilated convolutions and attention modules integrated into FPN improves lesion detection accuracy on the DeepLesion dataset over prior methods.

The Ethical Dilemma when (not) Setting up Cost-based Decision Rules in Semantic Segmentation

cs.CV · 2019-07-02 · unverdicted · novelty 5.0

Defining egoistic and altruistic cost functions for class confusions in semantic segmentation changes precision, recall, and segment-wise error rates relative to standard MAP decisions.

An Efficient Solution for Breast Tumor Segmentation and Classification in Ultrasound Images Using Deep Adversarial Learning

eess.IV · 2019-07-01 · unverdicted · novelty 5.0

cGAN with atrous convolutions and channel weighting segments breast tumors in ultrasound at 93.76% Dice and 88.82% IoU, then classifies benign vs malignant at 85% accuracy using boundary shape features.

Boosting the rule-out accuracy of deep disease detection using class weight modifiers

eess.IV · 2019-06-21 · unverdicted · novelty 5.0

Class weight modifiers applied to 'no mention' cases in the loss function improve rule-out performance for negated disease findings in deep classifiers trained on chest X-ray data derived from clinical notes.

citing papers explorer

Showing 38 of 38 citing papers.

WaveNet: A Generative Model for Raw Audio cs.SD · 2016-09-12 · accept · none · ref 55
WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.
Density estimation using Real NVP cs.LG · 2016-05-27 · accept · none · ref 69
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
Sampling the Schwinger Model with Gauge-Equivariant Diffusion hep-lat · 2026-06-25 · unverdicted · none · ref 31 · internal anchor
A gauge-equivariant diffusion model samples Schwinger model configurations, yielding unbiased observables matching MCMC and qualitatively less topological freezing than HMC.
WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning cs.CV · 2026-05-13 · unverdicted · none · ref 46 · internal anchor
WD-FQDet decouples modality-shared and modality-specific features in infrared-visible images via wavelet-based frequency decomposition and frequency-aware query selection to achieve state-of-the-art detection performance.
Graph-based Semantic Calibration Network for Unaligned UAV RGBT Image Semantic Segmentation and A Large-scale Benchmark cs.CV · 2026-04-29 · unverdicted · none · ref 25
GSCNet with FDAM and SGCM modules plus the URTF benchmark improves fine-grained semantic segmentation on unaligned UAV RGBT images.
KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition cs.CV · 2026-04-25 · unverdicted · none · ref 23
KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.
Cross-Stage Attention Propagation for Efficient Semantic Segmentation cs.CV · 2026-04-07 · unverdicted · none · ref 30
CSAP computes attention at the deepest scale and propagates the maps to shallower stages, bypassing per-scale query-key computations to cut decoder FLOPs while preserving multi-scale performance and beating SegNeXt-Tiny on ADE20K, Cityscapes, and COCO-Stuff.
Spatio-Temporal Retrieval-based Priors for Adaptive Computational Teaching in Driving cs.RO · 2026-06-23 · unverdicted · none · ref 31 · internal anchor
An encoder-decoder imitation learning model with nearest-neighbor retrieval and cross-attention priors shows consistent gains over non-adaptive and other adaptive baselines on a semi-synthetic Waymo-based dataset and a small real simulator coaching dataset.
From Spatial to Spectral: An Efficient, Frequency-Guided Feature Representation Learner for Small Object Detection cs.CV · 2026-06-22 · unverdicted · none · ref 42 · internal anchor
Proposes DERNet with Decompose-Enhance-Reconstruct operator and three plug-and-play modules to shift small object detection from spatial to spectral feature processing, claiming better performance than YOLOv11 with 1/6 the parameters.
From Coarse to Fine: Managing Temporal Granularity in Spatio-Temporal Data for Fine-Grained Traffic Prediction cs.AI · 2026-06-08 · unverdicted · none · ref 52 · internal anchor
STRP is a granularity-aware model that predicts fine-grained spatio-temporal traffic from coarse inputs via tree convolution and inverse dilated convolution, outperforming baselines on six datasets in window-based and duration-based settings.
IV-Net: A neural network for elliptic PDEs with random and highly varying coefficients math.NA · 2026-05-24 · unverdicted · none · ref 27 · internal anchor
IV-Net is a multigrid-inspired convolutional neural operator that approximates solutions to linear elliptic PDEs with high-contrast coefficients and shows better accuracy than POD and other neural operators on heterogeneous coercive problems.
Single Level Feature-to-Feature Forecasting with Deformable Convolutions cs.CV · 2019-07-26 · unverdicted · none · ref 29 · internal anchor
Single-level feature-to-feature forecasting with deformable convolutions on coarse abstract features from a segmentation backbone achieves state-of-the-art results for nine-timestep future semantic segmentation on Cityscapes validation.
VRSTC: Occlusion-Free Video Person Re-Identification cs.CV · 2019-07-19 · unverdicted · none · ref 35 · internal anchor
STCnet recovers occluded parts in video person re-ID using spatio-temporal cues to form the VRSTC framework, outperforming prior methods on three datasets.
ASCNet: Adaptive-Scale Convolutional Neural Networks for Multi-Scale Feature Learning cs.CV · 2019-07-07 · unverdicted · none · ref 11 · internal anchor
ASCNet learns per-pixel adaptive dilation rates via a 3-layer convolution structure to produce scale-appropriate receptive fields, yielding higher segmentation accuracy than fixed dilated CNNs on two medical image datasets.
Jointly Aligning and Predicting Continuous Emotion Annotations cs.LG · 2019-07-05 · unverdicted · none · ref 39 · internal anchor
A multi-delay sinc network jointly aligns speech signals with delayed continuous emotion labels and predicts arousal/valence, claiming state-of-the-art speech-only results on RECOLA and SEWA.
Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering cs.IR · 2019-06-26 · unverdicted · none · ref 59 · internal anchor
ConvNCF improves neural collaborative filtering by explicitly modeling pairwise correlations via outer product and high-order correlations via CNN on user-item embeddings.
Rethink the Role of Neural Decoders in Quantum Error Correction quant-ph · 2026-05-12 · unverdicted · none · ref 2
Neural decoders for surface-code QEC achieve practical microsecond FPGA latency when trained on large datasets with appropriate inductive biases and INT4 quantization, rather than relying on architectural complexity.
YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts cs.LG · 2026-05-06 · unverdicted · none · ref 15
YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more training domains are added.
DSFNet: Learning Dual-Domain Spectral Operators for Multi-Modality Spatio-Temporal Forecasting in Urban Transportation Systems cs.LG · 2026-06-05 · unverdicted · none · ref 14 · internal anchor
DSFNet introduces dual-domain spectral operators to explicitly model cross-modality couplings in multi-modality spatio-temporal traffic forecasting and reports 3.21%-10.16% MAE reductions versus baselines on five real-world datasets.
Cross Attention Network for Semantic Segmentation cs.CV · 2019-07-25 · unverdicted · none · ref 12 · internal anchor
Cross Attention Network fuses spatial and contextual features via a cross attention module to improve semantic segmentation performance and speed on Cityscapes and CamVid.
Attentive CT Lesion Detection Using Deep Pyramid Inference with Multi-Scale Booster cs.CV · 2019-07-09 · unverdicted · none · ref 15 · internal anchor
A Multi-Scale Booster with hierarchical dilated convolutions and attention modules integrated into FPN improves lesion detection accuracy on the DeepLesion dataset over prior methods.
The Ethical Dilemma when (not) Setting up Cost-based Decision Rules in Semantic Segmentation cs.CV · 2019-07-02 · unverdicted · none · ref 25 · internal anchor
Defining egoistic and altruistic cost functions for class confusions in semantic segmentation changes precision, recall, and segment-wise error rates relative to standard MAP decisions.
An Efficient Solution for Breast Tumor Segmentation and Classification in Ultrasound Images Using Deep Adversarial Learning eess.IV · 2019-07-01 · unverdicted · none · ref 18 · internal anchor
cGAN with atrous convolutions and channel weighting segments breast tumors in ultrasound at 93.76% Dice and 88.82% IoU, then classifies benign vs malignant at 85% accuracy using boundary shape features.
Boosting the rule-out accuracy of deep disease detection using class weight modifiers eess.IV · 2019-06-21 · unverdicted · none · ref 19 · internal anchor
Class weight modifiers applied to 'no mention' cases in the loss function improve rule-out performance for negated disease findings in deep classifiers trained on chest X-ray data derived from clinical notes.
AI-Empowered Low-Altitude Economy: Cooperative Sensing With Fixed Wireless Access eess.SP · 2026-05-08 · unverdicted · none · ref 37
Cooperative AI sensing with FWA CPEs using CSI features, attention, and Transformer achieves 0.63% missed detection and 6.5m positioning error for UAVs.
Machine Learning Enhanced Laser Spectroscopy for Multi-Species Gas Detection in Complex and Harsh Environments physics.optics · 2026-05-02 · unverdicted · none · ref 142
Machine learning methods including denoising autoencoders, unsupervised interference mitigation, blind source separation, and certifiable classification are developed and experimentally validated to improve multi-species laser spectroscopy under complex conditions.
Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation cs.CV · 2026-04-25 · unverdicted · none · ref 47
DGM-Net reaches 82.3% mIoU on Cityscapes and 45.24% on ADE20K using directional geometric guidance inside a linear-complexity Mamba backbone, without heavy pretraining or large models.
SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation cs.CV · 2026-05-20 · unverdicted · none · ref 21 · internal anchor
SpineContextResUNet achieves Dice scores of 88.17% on VerSe2020 and 88.13% on CTSpine1K while using ~1.7M parameters and running inference on commodity hardware with 8GB RAM.
Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes cs.CV · 2024-08-22 · unverdicted · none · ref 30 · internal anchor
GSAM applies random cropping to enable variable input sizes for efficient SAM fine-tuning, claiming lower compute with comparable or higher accuracy on varied datasets.
Efficient Segmentation: Learning Downsampling Near Semantic Boundaries cs.CV · 2019-07-16 · unverdicted · none · ref 55 · internal anchor
Learned adaptive downsampling for semantic segmentation that prioritizes locations near semantic boundaries to improve the accuracy-efficiency trade-off over uniform sampling.
Improving Semantic Segmentation via Dilated Affinity cs.CV · 2019-07-16 · unverdicted · none · ref 26 · internal anchor
Dilated affinity is jointly predicted with segmentation labels to strengthen features and support efficient label propagation refinement on benchmark datasets.
Multi-level Wavelet Convolutional Neural Networks cs.CV · 2019-07-06 · unverdicted · none · ref 8 · internal anchor
MWCNN integrates wavelet transforms into CNNs for image restoration tasks like denoising and super-resolution by using wavelet downsampling and inverse transforms to maintain resolution and expand context.
ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation cs.CV · 2019-06-24 · unverdicted · none · ref 24 · internal anchor
ESNet is a lightweight symmetric CNN using factorized residual units and parallel dilated convolutions that reaches over 62 FPS semantic segmentation on Cityscapes with 1.6M parameters.
CNNs for Vis-NIR Chemometrics: From Contradiction to Conditional Design cs.LG · 2026-05-04 · unverdicted · none · ref 50
Contradictions across CNN studies for Vis-NIR chemometrics are expected outcomes of uncontrolled variables in spectral physics and validation design, motivating a conditional rather than universal design framework.
Cross-subject Muscle Fatigue Detection via Adversarial and Supervised Contrastive Learning with Inception-Attention Network cs.LG · 2026-04-03 · unverdicted · none · ref 11
A model using Inception-attention, adversarial domain adaptation, and contrastive learning reaches 93.54% accuracy in three-class cross-subject muscle fatigue detection from sEMG signals.
A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera cs.CV · 2019-07-16 · unverdicted · none · ref 30 · internal anchor
A multitask framework lifts 2D keypoints to 3D poses via a two-stream network then applies ENAS to model spatio-temporal pose evolution for action recognition on Human3.6M, MSR Action3D and SBU datasets.
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation cs.CV · 2026-04-28 · unverdicted · none · ref 56
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.
Understanding Deep Learning Techniques for Image Segmentation cs.CV · 2019-07-13 · unverdicted · none · ref 211 · internal anchor
A 2019 survey that categorizes and intuitively explains major deep learning techniques for image segmentation, progressing from classical methods to modern neural architectures.

Multi-Scale Context Aggregation by Dilated Convolutions

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer