DIODE: A Dense Indoor and Outdoor DEpth Dataset

Andrea F. Daniele; Falcon Z. Dai; Gregory Shakhnarovich; Haochen Wang; Igor Vasiljevic; Matthew R. Walter; Mohammadreza Mostajabi; Nick Kolkin; Ruotian Luo; Shanyi Zhang

arxiv: 1908.00463 · v2 · pith:OFPQFK5Cnew · submitted 2019-08-01 · 💻 cs.CV

DIODE: A Dense Indoor and Outdoor DEpth Dataset

Igor Vasiljevic , Nick Kolkin , Shanyi Zhang , Ruotian Luo , Haochen Wang , Falcon Z. Dai , Andrea F. Daniele , Mohammadreza Mostajabi

show 3 more authors

Steven Basart Matthew R. Walter Gregory Shakhnarovich

This is my paper

classification 💻 cs.CV

keywords datasetdensedepthdiodeindooroutdoorimagesaccurate

0 comments

read the original abstract

We introduce DIODE, a dataset that contains thousands of diverse high resolution color images with accurate, dense, long-range depth measurements. DIODE (Dense Indoor/Outdoor DEpth) is the first public dataset to include RGBD images of indoor and outdoor scenes obtained with one sensor suite. This is in contrast to existing datasets that focus on just one domain/scene type and employ different sensors, making generalization across domains difficult. The dataset is available for download at http://diode-dataset.org

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 27 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction
cs.CV 2026-06 unverdicted novelty 7.0

MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.
DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images
cs.CV 2026-06 unverdicted novelty 7.0

DepthMaster unifies metric monocular depth estimation for perspective and panoramic images by patching panoramas into perspective views, adding a consistency loss and virtual cameras, and training mostly on perspectiv...
Honey, I Shrunk the Arc de Triomphe!
cs.CV 2026-06 unverdicted novelty 7.0

Introduces MetricScenes dataset with metric grounding from geo-tags and stereo, plus Poisson depth completion, showing fine-tuned MoGe-2 reduces scale-collapse in open scenes.
Honey, I Shrunk the Arc de Triomphe!
cs.CV 2026-06 unverdicted novelty 7.0

MetricScenes dataset from web photos and stereo imagery, plus a two-stage Poisson depth completion method, allows fine-tuning MoGe-2 to mitigate scale-collapse in metric monocular geometry while preserving benchmark p...
SurGe: Improved Surface Geometry in Point Maps
cs.CV 2026-05 unverdicted novelty 7.0

SurGe improves local surface geometry in feedforward point maps via gradient matching loss and Neighborhood Attention Decoder, topping average rank on eight zero-shot monocular geometry benchmarks for global AbsRel wh...
Depth2Pose: A Pose-Based Benchmark for Monocular Depth Estimation without Ground-Truth Depth
cs.CV 2026-05 unverdicted novelty 7.0

Depth2Pose is a new evaluation framework for monocular depth estimators that uses relative camera pose accuracy as a task-driven proxy and introduces the D2P dataset of challenging out-of-distribution scenes.
DBMSolver: A Training-free Diffusion Bridge Sampler for High-Quality Image-to-Image Translation
cs.CV 2026-05 unverdicted novelty 7.0

DBMSolver is a new training-free sampler using exponential integrators that reduces NFEs by up to 5x and improves quality in diffusion bridge model-based image-to-image translation tasks.
Image Generators are Generalist Vision Learners
cs.CV 2026-04 conditional novelty 7.0

An image generator is instruction-tuned to perform diverse vision tasks by representing task outputs as RGB images, achieving SOTA on segmentation and depth estimation.
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
cs.CV 2023-02 accept novelty 7.0

ZoeDepth combines relative depth pre-training on many datasets with metric depth fine-tuning and automatic head routing to achieve strong zero-shot generalization while preserving metric scale.
Adding Conditional Control to Text-to-Image Diffusion Models
cs.CV 2023-02 conditional novelty 7.0

ControlNet adds spatial conditioning controls to pretrained text-to-image diffusion models via zero convolutions for stable fine-tuning on small or large datasets.
PointDiT: Pixel-Space Diffusion for Monocular Geometry Estimation
cs.CV 2026-07 unverdicted novelty 6.0

PointDiT is a from-scratch pixel-space Diffusion Transformer for monocular 3D point map estimation that outperforms latent diffusion models in sharpness and ambiguous regions while using a simpler architecture.
AerialMetric: Benchmarking and Adapting UAV Monocular Metric Depth Estimation in the Real World
cs.CV 2026-06 unverdicted novelty 6.0

AerialMetric is a new benchmark dataset and evaluation suite for adapting monocular metric depth estimation models to real-world UAV aerial views.
Modality Forcing for Scalable Spatial Generation
cs.CV 2026-06 unverdicted novelty 6.0

Modality Forcing lets a single DiT produce image and depth outputs in any order after training on sparse real-world depth, with larger image-pretrained models yielding better depth accuracy and a 57% AbsRel reduction ...
Open-Source Image Editing Models Are Zero-Shot Vision Learners
cs.CV 2026-05 unverdicted novelty 6.0

Open-source image-editing models show competitive zero-shot performance on monocular depth, surface normals, and semantic segmentation, sometimes matching tuned models.
Image Generators are Generalist Vision Learners
cs.CV 2026-04 conditional novelty 6.0

Image generation pretraining builds generalist vision models that reach SOTA on 2D and 3D perception tasks by reframing them as RGB image outputs.
Image Generators are Generalist Vision Learners
cs.CV 2026-04 unverdicted novelty 6.0

Image generation pretraining produces generalist vision models that reframe perception tasks as image synthesis and reach SOTA results on segmentation, depth estimation, and other 2D/3D tasks.
SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations
cs.CV 2026-04 unverdicted novelty 6.0

SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and t...
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
cs.CV 2025-11 unverdicted novelty 6.0

Lotus-2 is a two-stage deterministic adaptation of diffusion priors that achieves state-of-the-art monocular depth estimation with only 59K training samples.
Depth Anything 3: Recovering the Visual Space from Any Views
cs.CV 2025-11 unverdicted novelty 6.0

DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.
Depth Anything V2
cs.CV 2024-06 unverdicted novelty 6.0

Depth Anything V2 delivers finer, more robust monocular depth predictions by replacing real labeled images with synthetic data, scaling the teacher model, and using large-scale pseudo-labeled real images for student training.
JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search
cs.CV 2026-05 unverdicted novelty 5.0

JetViT uses post-training attention search to hybridize full-attention ViTs with linear and window attention blocks, achieving up to 1.79x throughput gains on high-res images while preserving accuracy on DINOv3 and De...
The Midas Touch for Metric Depth
cs.CV 2026-05 unverdicted novelty 5.0

MTD turns relative depth into metric depth via segment-wise sparse graph optimization and discontinuity-aware geodesic pixel refinement, claiming better accuracy and generalization than prior depth methods.
Qwen-Image Technical Report
cs.CV 2025-08 unverdicted novelty 5.0

Qwen-Image is a foundation model that reaches state-of-the-art results in image generation and editing by combining a large-scale text-focused data pipeline with curriculum learning and dual semantic-reconstructive en...
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
cs.CV 2025-07 unverdicted novelty 5.0

MoGe-2 recovers metric-scale 3D point maps with fine details from single images via data refinement and extension of affine-invariant predictions.
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
cs.CV 2025-01 unverdicted novelty 5.0

DepthMaster proposes a single-step diffusion model with Feature Alignment and Fourier Enhancement modules in a two-stage training process to improve generalization and detail preservation in monocular depth estimation...
Large Depth Completion Model from Sparse Observations
cs.CV 2026-05 unverdicted novelty 4.0

LDCM achieves state-of-the-art metric depth completion from sparse observations by combining foundation-model initialization with a point-map regression head that removes the need for camera intrinsics.
Depth Completion in Unseen Field Robotics Environments Using Extremely Sparse Depth Measurements
cs.RO 2026-02 unverdicted novelty 4.0

A depth completion network trained on synthetic field-robotics scenes predicts dense metric depth from extremely sparse real measurements and runs in real time on embedded hardware in unseen outdoor environments.