pith. sign in

arxiv: 2501.12202 · v5 · submitted 2025-01-21 · 💻 cs.CV

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Pith reviewed 2026-05-23 04:51 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D asset generationdiffusion modelstexture synthesisshape generationdiffusion transformertext-to-3Dhigh-resolution 3D
0
0 comments X

The pith

Hunyuan3D 2.0 scales flow-based diffusion transformers to generate high-resolution textured 3D assets that outperform prior models in geometry, alignment, and texture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors present Hunyuan3D 2.0 as a system built around two large models: one that creates 3D shapes aligned to input images and another that adds detailed textures to those shapes or existing meshes. The shape model uses a scalable flow-based diffusion transformer, while the texture model draws on geometric and diffusion priors to produce vibrant high-resolution maps. Systematic tests show gains over both open and closed prior systems, and the full models plus code are released publicly to support further work in the open-source 3D community.

Core claim

Hunyuan3D 2.0 consists of Hunyuan3D-DiT, a scalable flow-based diffusion transformer that generates geometry aligned with a condition image, and Hunyuan3D-Paint, a texture synthesis model that produces high-resolution vibrant texture maps for generated or hand-crafted meshes; together they deliver measurable improvements in geometry details, condition alignment, and texture quality over previous state-of-the-art systems.

What carries the argument

The scalable flow-based diffusion transformer (Hunyuan3D-DiT) for condition-aligned shape generation paired with the geometric-prior texture model (Hunyuan3D-Paint).

If this is right

  • Shapes produced by the system align more closely with given condition images than earlier generators.
  • Texture maps reach higher resolution and vibrancy on both new and existing meshes.
  • The accompanying Hunyuan3D-Studio platform lets users manipulate and animate meshes with less manual effort.
  • Public release of weights and code supplies the open-source community with large-scale 3D foundation models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Production pipelines in games or film could shorten iteration cycles by adopting the released models for initial asset creation.
  • Combining the shape and texture stages into a single end-to-end pipeline might become feasible once training resources grow.
  • The same scaling approach could be tested on related tasks such as 3D animation or scene-level generation.

Load-bearing premise

The reported performance gains rest on fair comparisons that do not hide differences in training data scale, compute, or evaluation choices.

What would settle it

An independent side-by-side test in which another model matches or exceeds Hunyuan3D 2.0 scores on geometry detail, condition alignment, and texture quality when both systems are evaluated with matching disclosed data and compute budgets.

Figures

Figures reproduced from arXiv: 2501.12202 by Biwen Lei, Changrong Hu, Chao Zhang, Chongqing Zhao, Chunchao Guo (refer to the report for detailed contributions), Di Wang, Fan Yang, Haohan Weng, Haolin Liu, Hao Zhang, Haozhao Kuang, Huiwen Shi, Jiaao Yu, Jianbing Peng, Jianchen Zhu, Jian Liu, Jie Jiang, Jie Xiao, Jihong Zhang, Jinbao Xue, Jingwei Huang, Jing Xu, Junta Wu, Kai Liu, Lei Qin, Liang Dong, Lifu Wang, Lin Niu, Lixin Xu, Meng Chen, Minghui Chen, Mingxin Yang, Paige Wang, Peng He, Qingxiang Lin, Ruining Tang, Runzhou Wu, Shaoxiong Yang, Sheng Zhang, Shuhui Yang, Sicong Liu, Song Zhang, Tian Liu, Tianyu Huang, Weihao Zhuang, Xianghui Yang, Xinhai Liu, Xinming Wu, Xinzhou Wang, Xipeng Zhang, Xuhui Zuo, Xu Zheng, Yang Liu, Yangyu Tao, Yifei Feng, Yihang Lian, Yiling Zhu, Yingkai Wang, YingPing He, Yiwen Jia, Yixuan Tang, Yonghao Tan, Yong Yang, Yuhong Liu, Yulin Cai, Yunfei Zhao, Zebin He, Zeqiang Lai, Zhan Li, Zheng Ye, Zhichao Hu, Zhongyi Fan, Zhuo Chen, Zibo Zhao.

Figure 1
Figure 1. Figure 1: An overall of Hunyuan3D 2.0 system. 2 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overall of Hunyuan3D 2.0 architecture for 3D generation. It consists of two main components: Hunyuan3D-DiT for generating bare mesh from a given input image and Hunyuan3D￾Paint for generating a textured map for the generated bare mesh. Hunyuan3D-Paint takes geometry conditions – normal maps and position maps of generated mesh as inputs and generates multi-view images for texture baking. 3 Generative 3D … view at source ↗
Figure 3
Figure 3. Figure 3: The overall architecture of Hunyuan3D-ShapeVAE. Instead of only using uniform sampling on mesh surface, We have developed an importance sampling strategy to extract high-frequency detail information from the input mesh surface, such as edges and cor￾ners. This allows the model to better capture and represent the intricate details of 3D shapes. Note that during the point query construction, the Farthest Poi… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of Hunyuan3D-DiT. It adopts a transformer architecture with both double- and [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of Hunyuan3D-Paint. We leverage an image delighting module to convert the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparisons. We illustrate the reconstructed mesh (blue paint aims to show more [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparisons. We display the input image and the generated bare mesh (blue paint [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparisons. We demonstrate several generated texture maps on different bare [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual results. We generate different texture maps for two meshes, and the results validate [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The results of user study. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual comparisons. The first case reflects that [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
read the original abstract

We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: https://github.com/Tencent/Hunyuan3D-2

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents Hunyuan3D 2.0, a large-scale 3D synthesis system with two core components: Hunyuan3D-DiT, a scalable flow-based diffusion transformer for generating geometry aligned to condition images, and Hunyuan3D-Paint, a texture synthesis model leveraging geometric and diffusion priors to produce high-resolution textures on generated or hand-crafted meshes. It also introduces Hunyuan3D-Studio, a user-friendly platform for mesh manipulation and animation. The authors claim systematic evaluations demonstrate outperformance over prior SOTA open- and closed-source models in geometry details, condition alignment, texture quality, and related metrics, and publicly release the code and pre-trained weights.

Significance. If the outperformance claims hold under reproducible conditions, this would be a significant contribution to the open-source 3D generation community by providing large-scale foundation models for shape and texture. The public release of code, weights, and the studio platform is a clear strength that supports independent verification and broader adoption.

major comments (2)
  1. [§4 (Experiments)] The central claim of outperformance (abstract and §4) rests on systematic evaluations, yet the manuscript provides insufficient detail on evaluation protocols, exact datasets, baseline implementations (especially for closed-source models), metric definitions, and statistical analysis; this directly affects verifiability of superiority in geometry, alignment, and texture.
  2. [Abstract and §4] Comparisons with closed-source models (abstract) do not specify access methods, version matching, or compute equalization, which is load-bearing for the fairness of the outperformance claim.
minor comments (2)
  1. [§3 (Method)] The description of Hunyuan3D-DiT and Hunyuan3D-Paint would benefit from explicit statements on model scale (parameters, training data volume) to contextualize the scaling claims.
  2. [§4] Figure captions and tables in the experiments section should include error bars or variance measures where quantitative results are reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments regarding the verifiability of our experimental claims. We address each major comment below and will make revisions to improve transparency and reproducibility.

read point-by-point responses
  1. Referee: [§4 (Experiments)] The central claim of outperformance (abstract and §4) rests on systematic evaluations, yet the manuscript provides insufficient detail on evaluation protocols, exact datasets, baseline implementations (especially for closed-source models), metric definitions, and statistical analysis; this directly affects verifiability of superiority in geometry, alignment, and texture.

    Authors: We agree that the current level of detail in Section 4 is insufficient for full independent verification. In the revised manuscript we will expand the experimental section with explicit descriptions of the evaluation protocols, the precise datasets and test splits employed, implementation details or references for all baselines (including closed-source models), formal definitions of each metric, and any statistical procedures such as variance across runs or significance testing. revision: yes

  2. Referee: [Abstract and §4] Comparisons with closed-source models (abstract) do not specify access methods, version matching, or compute equalization, which is load-bearing for the fairness of the outperformance claim.

    Authors: We acknowledge that the manuscript does not currently provide these specifics. We will revise both the abstract and Section 4 to document the access methods (official APIs or interfaces), the dates or versions used for each closed-source model, and the steps taken to equalize compute where feasible. For proprietary systems we will note the inherent limitations on exact version matching while supplying the best available information. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical release with public code

full rationale

The paper describes an empirical 3D generation system (Hunyuan3D-DiT shape model and Hunyuan3D-Paint texture model) built on standard diffusion transformers, with systematic benchmark evaluations and public code/weights release. No derivation chain, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. Claims of outperformance rest on external comparisons rather than internal reductions, rendering the work self-contained against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the models are standard diffusion transformers scaled up, with all details deferred to the unreviewed full text.

pith-pipeline@v0.9.0 · 6077 in / 1108 out tokens · 25067 ms · 2026-05-23T04:51:16.495249+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CORGI: Consistency-Aware 3D Dog Reconstruction from a Single Image in the Wild

    cs.CV 2026-07 unverdicted novelty 7.0

    A new pipeline using canonical LoRAs for view synthesis, deformable 3D Gaussian splatting anchored on D-SMAL, and generative repair to produce animatable 3D dogs from single wild images without 3D supervision.

  2. WarpHammer: Densifying Scene Warps with 3D Object Priors for Extreme View Synthesis

    cs.CV 2026-06 unverdicted novelty 7.0

    WarpHammer densifies scene warps with 3D object priors from generative models and fuses pose-unknown auxiliary views via multi-view geometry to enable stable extreme novel view synthesis.

  3. UnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image

    cs.CV 2026-06 unverdicted novelty 7.0

    UnfoldArt uses multi-agent debate grounded in vision-language and video models to infer articulation parameters and reconstruct full 3D objects including occluded parts from text or image inputs.

  4. UnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image

    cs.CV 2026-06 unverdicted novelty 7.0

    UnfoldArt uses a two-round structured debate between high-level semantic agents and low-level parameter agents, grounded in generated video, to infer articulation and reconstruct full articulated 3D objects including ...

  5. GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

    cs.CV 2026-05 unverdicted novelty 7.0

    GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

  6. Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

    cs.CV 2026-05 unverdicted novelty 7.0

    Stream3D is a training-free method that maintains temporal consistency in 3D generation from monocular streams by dynamically caching a fixed number of informative historical frames using an evidence score.

  7. CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts

    cs.GR 2026-05 unverdicted novelty 7.0

    CelloCut formulates watertight remeshing as binary labeling on a Delaunay tetrahedral partition solved by graph-cut minimization with one-sided constraints to guarantee volumetrically consistent solids.

  8. QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning

    cs.GR 2026-05 unverdicted novelty 7.0

    QuadLink generates anisotropic quad-dominant meshes from point clouds via anchor prediction, centroid-conditioned linking, and quad-first assembly, supporting hybrid n-gon topology.

  9. OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

    cs.CV 2026-04 unverdicted novelty 7.0

    OmniFit uses a conditional transformer decoder to predict dense body landmarks from multi-modal inputs for scale-agnostic SMPL-X fitting, outperforming prior methods by 57-81% and reaching millimeter accuracy on CAPE ...

  10. Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

    cs.CV 2026-04 unverdicted novelty 7.0

    A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in re...

  11. Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

    cs.CV 2026-04 unverdicted novelty 7.0

    A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.

  12. Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale

    cs.CV 2026-04 unverdicted novelty 7.0

    A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.

  13. GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

    cs.CV 2026-04 unverdicted novelty 7.0

    GenLCA enables scalable training of a 3D diffusion model for photorealistic, animatable full-body avatars by tokenizing large-scale real-world videos with a pretrained reconstructor and applying visibility-aware diffu...

  14. InverseDraping: Recovering Sewing Patterns from 3D Garment Surfaces via BoxMesh Bridging

    cs.CV 2026-04 unverdicted novelty 7.0

    A two-stage autoregressive framework centered on BoxMesh recovers parametric sewing patterns from 3D garment surfaces, claiming state-of-the-art results on benchmarks and generalization to real scans and single-view images.

  15. PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation

    cs.CV 2026-02 unverdicted novelty 7.0

    PerpetualWonder introduces a closed-loop generative simulator with a unified physical-visual representation for long-horizon action-conditioned 4D scene generation from one image.

  16. VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image

    cs.CV 2026-02 unverdicted novelty 7.0

    VecSet-Edit is the first method to perform high-fidelity mesh editing from a single image by analyzing and manipulating spatial token subsets in a pre-trained VecSet LRM.

  17. ATATA: One Algorithm to Align Them All

    cs.CV 2026-01 unverdicted novelty 7.0

    ATATA enables fast joint inference of structurally aligned pairs using Rectified Flow models via segment transport, improving state-of-the-art for image and video generation while matching 3D quality at much higher speed.

  18. LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents

    cs.CV 2025-12 unverdicted novelty 7.0

    LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instru...

  19. PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

    cs.CV 2025-05 unverdicted novelty 7.0

    PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.

  20. Ink3D: Sculpting 3D Assets with Extremely Complex Textures via Video Generative Models

    cs.CV 2026-07 unverdicted novelty 6.0

    Ink3D decouples geometry from texture by generating dense orbit videos with a conditional video model and baking them via a neural optimizer to produce complex 3D textures.

  21. PointSplat: Compact Gaussian Splatting via Human-Centric Prediction

    cs.CV 2026-06 unverdicted novelty 6.0

    PointSplat infers compact Gaussian splats directly in 3D space from input point sets via ray casting and Point-Image Transformer to reduce inter-view redundancy and improve novel-view quality for humans.

  22. Mesh BDF: Barycentric Dominance Field for 3D Native Mesh Generation

    cs.CV 2026-06 unverdicted novelty 6.0

    Barycentric Dominance Field converts discrete mesh connectivity into a continuous surface signal that diffusion models can use directly for higher-quality native 3D mesh generation.

  23. DualBrep: A Dual-Field Continuous Representation for B-rep Modelling

    cs.GR 2026-06 unverdicted novelty 6.0

    DualBrep encodes B-rep models as dual scalar fields (SDF geometry + UDF topology) compressed into a shared latent space for flow-matching generation and neural B-rep extraction.

  24. HandMade: Spatial Prompting for Generative 3D Creation with Part-Labeled VR Sketches

    cs.HC 2026-06 unverdicted novelty 6.0

    HandMade converts segmented VR strokes into multi-view part guidance and structured prompts so generative 3D models better preserve user-specified spatial scaffolds than text-only or sketch baselines.

  25. SubdivAR: Autoregressive Next-Scale Prediction for Neural Mesh Subdivision

    cs.CV 2026-06 unverdicted novelty 6.0

    SubdivAR reformulates neural mesh subdivision as autoregressive next-scale coordinate prediction with a topology-aware transformer and reports 18.8% and 14.2% reductions in Hausdorff and Chamfer distance over baseline...

  26. HiFiVe: High-Fidelity Vehicle Generation Leveraging Auto-Regressive 2D Generative Priors

    cs.CV 2026-06 unverdicted novelty 6.0

    HiFiVe is a training-free framework using an auto-regressive texture refinement pipeline with depth-based warping, multi-view fusion, and symmetry to enhance both texture and geometry fidelity in vehicle generation fr...

  27. Artic-O: End-to-End Articulated Object Reconstruction via Latent Geometry Learning

    cs.CV 2026-06 unverdicted novelty 6.0

    Artic-O introduces an end-to-end feed-forward model that reconstructs articulated objects from sparse multi-state images via latent geometry learning and an image-grounded part-reasoning module, outperforming staged m...

  28. Surflo: Consistent 3D Surface Flow Model with Global State

    cs.CV 2026-06 unverdicted novelty 6.0

    Surflo compresses unposed RGB views into K global latent tokens and uses flow matching with photometric guidance to decode consistent arbitrary-resolution 3D surface points in one forward pass.

  29. DeepJEB++: Foundation Model-Driven Large-Scale 3D Engineering Dataset via 2D Latent Space Augmentation

    cs.LG 2026-06 unverdicted novelty 6.0

    DeepJEB++ expands a small seed set of jet engine brackets into 15,360 labeled 3D designs via 2D latent diffusion augmentation, VLM filtering, generative 3D lifting, and automated finite-element labeling.

  30. MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents

    cs.CV 2026-06 unverdicted novelty 6.0

    MORPHOS introduces an autoregressive 4D generation method with Temporal Structured Latents (T-SLAT) that produces dynamic 3D assets from videos while handling topological changes and long sequences.

  31. FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

    cs.CV 2026-05 unverdicted novelty 6.0

    FoundObj uses foundation-model priors as RL rewards to discover multi-class 3D objects from point clouds without scene-level labels.

  32. Helix4D: Complex 4D Mesh Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    Helix4D generates high-quality dynamic 4D meshes from videos by extending Trellis2 with sliding-window cross-frame attention anchored on the first frame and a repurposed 4D temporal encoding.

  33. Fishbone: From One 3D Asset to a Million Controllable Edits

    cs.CV 2026-05 unverdicted novelty 6.0

    Fishbone introduces a unified rib-spine representation computed via adaptive heat method, iso-contour ribs, and geometry-aware spine that enables real-time parametric deformation, reduced-space simulation, and animati...

  34. Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

    cs.CV 2026-05 unverdicted novelty 6.0

    Stream3D is a training-free method that maintains a fixed-size evidential memory of past frames to convert frozen view-conditioned 3D generators into consistent streaming generators.

  35. ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    ROAR-3D adds a token-wise view router and dual-stream attention to pretrained single-view 3D generators so they can use arbitrary unposed images for higher-fidelity output.

  36. QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning

    cs.GR 2026-05 unverdicted novelty 6.0

    QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.

  37. TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.

  38. Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos

    cs.CV 2026-05 unverdicted novelty 6.0

    HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.

  39. Pixal3D: Pixel-Aligned 3D Generation from Images

    cs.CV 2026-05 unverdicted novelty 6.0

    Pixal3D performs pixel-aligned 3D generation from images via back-projected multi-scale feature volumes, achieving fidelity close to reconstruction while supporting multi-view and scene synthesis.

  40. GenMed: A Pairwise Generative Reformulation of Medical Diagnostic Tasks

    cs.CV 2026-05 unverdicted novelty 6.0

    GenMed uses diffusion models to capture P(X,Y) for medical tasks and performs inference via gradient-based test-time optimization, supporting arbitrary observation combinations without retraining.

  41. DVD: Discrete Voxel Diffusion for 3D Generation and Editing

    cs.CV 2026-05 unverdicted novelty 6.0

    DVD applies discrete diffusion directly to voxel occupancy for 3D generation, uncertainty estimation via entropy, and single-round editing via block perturbation fine-tuning.

  42. DVD: Discrete Voxel Diffusion for 3D Generation and Editing

    cs.CV 2026-05 unverdicted novelty 6.0

    DVD treats voxel occupancy as a discrete variable in a diffusion framework to generate, assess, and edit sparse 3D voxels without continuous thresholding.

  43. Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

    cs.CV 2026-05 unverdicted novelty 6.0

    DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.

  44. CADFit: Precise Mesh-to-CAD Program Generation with Hybrid Optimization

    cs.CV 2026-05 unverdicted novelty 6.0

    CADFit recovers complex editable CAD construction sequences from meshes via IoU-driven hybrid optimization over structured programs, outperforming prior methods on volumetric IoU, Chamfer Distance, and invalid ratio.

  45. CADFit: Precise Mesh-to-CAD Program Generation with Hybrid Optimization

    cs.CV 2026-05 unverdicted novelty 6.0

    CADFit recovers complex editable CAD construction sequences from meshes via IoU-driven hybrid optimization and outperforms prior mesh-to-CAD methods on volumetric IoU, Chamfer Distance, and invalid program ratio.

  46. MeshReGen: A Unified 3D Geometry Regeneration Framework

    cs.CV 2026-04 unverdicted novelty 6.0

    MeshReGen introduces a conditioned 3D geometry regenerator with VecSet that learns a regeneration prior via self-supervision and reports state-of-the-art results on controllable generation tasks.

  47. MeshReGen: A Unified 3D Geometry Regeneration Framework

    cs.CV 2026-04 unverdicted novelty 6.0

    3D-ReGen is a conditioned 3D regenerator using VecSet that learns a regeneration prior from unlabeled 3D datasets via self-supervised tasks and achieves state-of-the-art results on controllable 3D geometry tasks.

  48. Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

    cs.CV 2026-04 unverdicted novelty 6.0

    Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.

  49. MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

    cs.CV 2026-04 unverdicted novelty 6.0

    MetaEarth3D is the first generative foundation model for spatially consistent, unbounded 3D scene generation at planetary scale using optical Earth observation data.

  50. PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios

    cs.CV 2026-04 unverdicted novelty 6.0

    PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional ...

  51. Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data

    cs.CV 2026-04 unverdicted novelty 6.0

    BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.

  52. Pair2Scene: Learning Local Object Relations for Procedural Scene Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    Pair2Scene generates complex 3D scenes beyond training data by recursively applying a learned model of local support and functional object-pair relations inside hierarchies, using collision-aware rejection sampling fo...

  53. Pair2Scene: Learning Local Object Relations for Procedural Scene Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    Pair2Scene generates complex 3D scenes beyond training data by training a network on local object-pair placement rules and applying them recursively with collision-aware sampling.

  54. AssemLM: A Spatial Reasoning Multimodal Large Language Model for Robotic Assembly

    cs.RO 2026-04 unverdicted novelty 6.0

    AssemLM uses a specialized point cloud encoder inside a multimodal LLM to reach state-of-the-art 6D pose prediction for assembly tasks, backed by a new 900K-sample benchmark called AssemBench.

  55. UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    UniRecGen unifies reconstruction and generation via shared canonical space and disentangled cooperative learning to produce complete, consistent 3D models from sparse views.

  56. MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation

    cs.CV 2026-03 unverdicted novelty 6.0

    MV-SAM3D adds multi-view fusion via multi-diffusion with attention-entropy and visibility weighting plus physics-aware optimization to improve fidelity and physical plausibility in layout-aware 3D generation.

  57. Restore3D: Breathing Life into Broken Objects with Shape and Texture Restoration

    cs.CV 2026-07 unverdicted novelty 5.0

    Restore3D restores shape and texture of broken 3D objects via multi-view image refinement with a Mask Self-Perceiver and coarse-to-fine mesh reconstruction, outperforming baselines on synthetic and real benchmarks.

  58. Vitality-Aware Compression for Efficient Image-to-Shape Diffusion Transformers

    cs.CV 2026-07 unverdicted novelty 5.0

    Introduces vitality-aware compression for image-to-3D DiT models via structured pruning, adaptive quantization, and fine-tuning, claiming 66% size reduction with comparable fidelity.

  59. JacobianAvatar: Temporally Consistent Semi-rigid Avatar Reconstruction from a Monocular Video

    cs.CV 2026-06 unverdicted novelty 5.0

    JacobianAvatar uses neural Jacobian fields with a constrained Poisson solver, signed distance regularization, and deformation-guided flow loss to produce temporally consistent avatars from monocular video.

  60. HiFiVe: High-Fidelity Vehicle Generation Leveraging Auto-Regressive 2D Generative Priors

    cs.CV 2026-06 unverdicted novelty 5.0

    HiFiVe generates high-fidelity 3D vehicles by anchoring auto-regressive 2D texture synthesis to coarse geometry via depth warping and symmetry, then refining mesh with normal maps.

Reference graph

Works this paper leans on

122 extracted references · 122 canonical work pages · cited by 74 Pith papers · 16 internal anchors

  1. [1]

    Doodle your 3d: From abstract freehand sketches to precise 3d shapes

    Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Doodle your 3d: From abstract freehand sketches to precise 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9795–9805, 2024

  2. [2]

    Meta 3d texturegen: Fast and consistent texture generation for 3d objects.arXiv preprint arXiv:2407.02430,

    Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, and Oran Gafni. Meta 3d texturegen: Fast and consistent texture generation for 3d objects. arXiv preprint arXiv:2407.02430, 2024

  3. [3]

    Improving image generation with better captions

    James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, et al. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023

  4. [4]

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, et al. Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954, 2024

  5. [5]

    Mesh2tex: Generating mesh textures from image queries

    Alexey Bokhovkin, Shubham Tulsiani, and Angela Dai. Mesh2tex: Generating mesh textures from image queries. In IEEE International Conference on Computer Vision (ICCV), October 2023

  6. [6]

    InstructPix2Pix: Learning to follow image editing instructions.arXiv preprint arXiv:2211.09800, 2022

    Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. arXiv preprint arXiv:2211.09800, 2022

  7. [7]

    Efficient geometry-aware 3d generative adversarial networks

    Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16123–16133, 2022

  8. [8]

    ShapeNet: An Information-Rich 3D Model Repository

    Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015

  9. [9]

    Text2tex: Text-driven texture synthesis via diffusion models

    Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. Text2tex: Text-driven texture synthesis via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18558–18568, 2023

  10. [10]

    Shaddr: Interactive example-based geometry and texture generation via 3d shape detailization and differentiable rendering

    Qimin Chen, Zhiqin Chen, Hang Zhou, and Hao Zhang. Shaddr: Interactive example-based geometry and texture generation via 3d shape detailization and differentiable rendering. In SIGGRAPH Asia 2023 Conference Papers, 2023

  11. [11]

    Dora: Sampling and benchmarking for 3d shape variational auto-encoders

    Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, and Ping Tan. Dora: Sampling and benchmarking for 3d shape variational auto-encoders. arXiv preprint arXiv:2412.17808, 2024

  12. [12]

    Meshanything v2: Artist-created mesh generation with adjacent mesh tokenization,

    Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, and Guosheng Lin. Meshanything v2: Artist-created mesh generation with adjacent mesh tokenization. arXiv preprint arXiv:2408.02555, 2024

  13. [13]

    Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

    Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24185–24198, 2024

  14. [14]

    Learning implicit fields for generative shape modeling

    Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5939–5948, 2019

  15. [15]

    Mvpaint: Synchronized multi-view diffusion for painting anything 3d

    Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, et al. Mvpaint: Synchronized multi-view diffusion for painting anything 3d. arXiv preprint arXiv:2411.02336, 2024

  16. [16]

    Sdfusion: Multimodal 3d shape completion, reconstruction, and generation

    Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander G Schwing, and Liang-Yan Gui. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4456–4465, 2023

  17. [17]

    Abo: Dataset and benchmarks for real-world 3d object understanding

    Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, and Jitendra Malik. Abo: Dataset and benchmarks for real-world 3d object understanding. CVPR, 2022. 22

  18. [18]

    A volumetric method for building complex models from range images

    Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312, 1996

  19. [19]

    Objaverse-XL: A Universe of 10M+ 3D Objects

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl V ondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. Objaverse-xl: A universe of 10m+ 3d objects. arXiv preprint arXiv:2307.05663, 2023

  20. [20]

    Objaverse: A universe of annotated 3d objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. arXiv preprint arXiv:2212.08051, 2022

  21. [21]

    McHugh, and Vincent Vanhoucke

    Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B. McHugh, and Vincent Vanhoucke. Google scanned objects: A high-quality dataset of 3d scanned household items, 2022

  22. [22]

    The Llama 3 Herd of Models

    Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  23. [23]

    Fine detailed texture learning for 3d meshes with generative models

    Aysegul Dundar, Jun Gao, Andrew Tao, and Bryan Catanzaro. Fine detailed texture learning for 3d meshes with generative models. IEEE Trans. Pattern Anal. Mach. Intell., 2023

  24. [24]

    Scaling rectified flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first International Conference on Machine Learning, 2024

  25. [25]

    3d-front: 3d furnished rooms with layouts and semantics

    Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, Chengyue Sun, Rongfei Jia, Binqiang Zhao, et al. 3d-front: 3d furnished rooms with layouts and semantics. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10933–10942, 2021

  26. [26]

    Get3d: A generative model of high quality 3d textured shapes learned from images

    Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems, 35:31841–31854, 2022

  27. [27]

    Tm-net: Deep generative networks for textured meshes

    Lin Gao, Tong Wu, Yu-Jie Yuan, Ming-Xian Lin, Yu-Kun Lai, and Hao Zhang. Tm-net: Deep generative networks for textured meshes. ACM Trans. Graph., 40(6):1–15, 2021

  28. [28]

    Surface simplification using quadric error metrics

    Michael Garland and Paul S Heckbert. Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 209–216, 1997

  29. [29]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014

  30. [30]

    Sparse 3D convolutional neural networks

    Ben Graham. Sparse 3d convolutional neural networks. arXiv preprint arXiv:1505.02890, 2015

  31. [31]

    Deep autoregressive networks

    Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, and Daan Wierstra. Deep autoregressive networks. In International Conference on Machine Learning, pages 1242–1250. PMLR, 2014

  32. [32]

    Sketch2mesh: Reconstructing and editing 3d shapes from sketches

    Benoit Guillard, Edoardo Remelli, Pierre Yvernay, and Pascal Fua. Sketch2mesh: Reconstructing and editing 3d shapes from sketches. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13023–13032, 2021

  33. [33]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

  34. [34]

    LRM: Large reconstruction model for single image to 3d

    Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single image to 3d. In The Twelfth International Conference on Learning Representations, 2024

  35. [35]

    New quadric metric for simplifying meshes with appearance attributes

    Hugues Hoppe. New quadric metric for simplifying meshes with appearance attributes. In Proceedings Visualization’99 (Cat. No. 99CB37067), pages 59–510. IEEE, 1999

  36. [36]

    Mv-adapter: Multi-view consistent image generation made easy.arXiv preprint arXiv:2412.03632, 2024

    Zehuan Huang, Yuanchen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao, and Lu Sheng. Mv-adapter: Multi-view consistent image generation made easy. arXiv preprint arXiv:2412.03632, 2024. 23

  37. [37]

    Make-a-shape: a ten-million-scale 3d shape model

    Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, and Chi-Wing Fu. Make-a-shape: a ten-million-scale 3d shape model. InForty-first International Conference on Machine Learning, 2024

  38. [38]

    Rethinking fid: Towards a better evaluation metric for image generation

    Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Rethinking fid: Towards a better evaluation metric for image generation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 9307–9315, 2024

  39. [39]

    Flexitex: Enhancing texture generation with visual guidance

    DaDong Jiang, Xianghui Yang, Zibo Zhao, Sheng Zhang, Jiaao Yu, Zeqiang Lai, Shaoxiong Yang, Chunchao Guo, Xiaobo Zhou, and Zhihui Ke. Flexitex: Enhancing texture generation with visual guidance. arXiv preprint arXiv:2409.12431, 2024

  40. [40]

    3d gaussian splatting for real-time radiance field rendering

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1, 2023

  41. [41]

    Auto-Encoding Variational Bayes

    Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

  42. [42]

    EASI-Tex: Edge-aware mesh texturing from single-image

    Perla Sai Raj Kishore, Yizhi Wang, Ali Mahdavi-Amiri, and Hao Zhang. EASI-Tex: Edge-aware mesh texturing from single-image. ACM Transactions on Graphics (Special Issue of SIGGRAPH), 43(4), 2024

  43. [43]

    HunyuanVideo: A Systematic Framework For Large Video Generative Models

    Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models. arXiv preprint arXiv:2412.03603, 2024

  44. [44]

    Fitting smooth surfaces to dense polygon meshes

    Venkat Krishnamurthy and Marc Levoy. Fitting smooth surfaces to dense polygon meshes. InProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 313–324, 1996

  45. [45]

    Black Forest Labs. Flux. https://github.com/black-forest-labs/flux, 2024

  46. [46]

    Ln3diff: Scalable latent neural fields diffusion for speedy 3D generation

    Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. Ln3diff: Scalable latent neural fields diffusion for speedy 3D generation. In European Conference on Computer Vision (ECCV), 2024

  47. [47]

    Gaussiananything: Interactive point cloud latent diffusion for 3d generation

    Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, and Chen Change Loy. Gaussiananything: Interactive point cloud latent diffusion for 3d generation. In ICLR, 2025

  48. [48]

    Era3d: High-resolution multiview diffusion using efficient row-wise attention

    Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, et al. Era3d: High-resolution multiview diffusion using efficient row-wise attention. arXiv preprint arXiv:2405.11616, 2024

  49. [49]

    Craftsman: High-fidelity mesh generation with 3d native generation and interactive geometry refiner, 2024

    Weiyu Li, Jiarui Liu, Hongyu Yan, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, and Xiaoxiao Long. Craftsman: High-fidelity mesh generation with 3d native generation and interactive geometry refiner, 2024

  50. [50]

    Hunyuan-dit: A powerful multi-resolution diffusion transformer with fine-grained chinese understanding, 2024

    Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue,...

  51. [51]

    Magic3d: High-resolution text-to-3d content creation

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In IEEE Computer Vision and Pattern Recognition (CVPR), 2023

  52. [52]

    Common diffusion noise schedules and sample steps are flawed

    Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Common diffusion noise schedules and sample steps are flawed. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 5404–5411, 2024

  53. [53]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022

  54. [54]

    Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T. Q. Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code, 2024

  55. [55]

    Zero-1-to-3: Zero-shot one image to 3d object

    Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 24

  56. [56]

    Vcd-texture: Variance alignment based 3d-2d co-denoising for text-guided texturing

    Shang Liu, Chaohui Yu, Chenjie Cao, Wen Qian, and Fan Wang. Vcd-texture: Variance alignment based 3d-2d co-denoising for text-guided texturing. In European Conference on Computer Vision, pages 373–389. Springer, 2025

  57. [57]

    Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis

    Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, and Shenghua Gao. Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5904–5913, 2019

  58. [58]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022

  59. [59]

    Text-guided texturing by synchronized multi-view diffusion

    Yuxin Liu, Minshan Xie, Hanyuan Liu, and Tien-Tsin Wong. Text-guided texturing by synchronized multi-view diffusion. In SIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024

  60. [60]

    Wonder3d: Single image to 3d using cross-domain diffusion

    Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Single image to 3d using cross-domain diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9970–9980, 2024

  61. [61]

    Genesistex2: Stable, consistent and high-quality text-to-texture generation

    Jiawei Lu, Yingpeng Zhang, Zengjun Zhao, He Wang, Kun Zhou, and Tianjia Shao. Genesistex2: Stable, consistent and high-quality text-to-texture generation. arXiv preprint arXiv:2409.18401, 2024

  62. [62]

    Latent-nerf for shape- guided generation of 3d shapes and textures

    Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. Latent-nerf for shape- guided generation of 3d shapes and textures. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 12663–12673, 2023

  63. [63]

    Autosdf: Shape priors for 3d completion, reconstruction and generation

    Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, and Shubham Tulsiani. Autosdf: Shape priors for 3d completion, reconstruction and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 306–315, 2022

  64. [64]

    Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nicolas Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patrick Laba...

  65. [65]

    Normalizing flows for probabilistic modeling and inference

    George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshmi- narayanan. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64, 2021

  66. [66]

    On aliased resizing and surprising subtleties in gan evaluation

    Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu. On aliased resizing and surprising subtleties in gan evaluation. In IEEE Computer Vision and Pattern Recognition (CVPR), 2022

  67. [67]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023

  68. [68]

    Convolutional occupancy networks

    Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. Convolutional occupancy networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 523–540. Springer, 2020

  69. [69]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023

  70. [70]

    Barron, and Ben Mildenhall

    Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2023

  71. [71]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021

  72. [72]

    Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies

    Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, and Francis Williams. Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4209–4219, 2024

  73. [73]

    Texture: Text-guided texturing of 3d shapes

    Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or. Texture: Text-guided texturing of 3d shapes. In ACM SIGGRAPH 2023 conference proceedings, pages 1–11, 2023. 25

  74. [74]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

  75. [75]

    Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500–22510, 2023

  76. [76]

    Clip-forge: Towards zero-shot text-to-shape generation

    Aditya Sanghi, Hang Chu, Joseph G Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero, and Ka- mal Rahimi Malekshan. Clip-forge: Towards zero-shot text-to-shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18603–18613, 2022

  77. [77]

    Buildingnet: Learning to label 3d buildings

    Pratheba Selvaraju, Mohamed Nabail, Marios Loizou, Maria Maslioukova, Melinos Averkiou, Andreas Andreou, Siddhartha Chaudhuri, and Evangelos Kalogerakis. Buildingnet: Learning to label 3d buildings. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10397– 10407, October 2021

  78. [78]

    Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

    Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110, 2023

  79. [79]

    Mvdream: Multi-view diffusion for 3d generation

    Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d generation. In The Twelfth International Conference on Learning Representations, 2023

  80. [80]

    Texturify: Generating textures on 3d shape surfaces

    Yawar Siddiqui, Justus Thies, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. Texturify: Generating textures on 3d shape surfaces. In European Conference on Computer Vision (ECCV), pages 72–88. Springer, 2022

Showing first 80 references.