AI Heap
Published on

ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation

arXiv:2504.02316 - [arXiv,PDF]
Authors
  • Name
    Yuan Zhou
  • Name
    Shilong Jin
  • Name
    Litao Hua
  • Name
    Wanjun Lv
  • Name
    Haoran Duan
  • Name
    Jungong Han
  • Affiliation
    School of Artificial Intelligence, Nanjing University of Information Science and Technology, Jiangsu 210044, China
  • Affiliation
    Lenovo, Beijing, China
  • Affiliation
    Department of Automation, Tsinghua University, Beijing, China
Recent advances in zero-shot text-to-3D generation have revolutionized 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent view biases in T2I priors. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. To address this fundamental challenge, we propose ConsDreamer, a novel framework that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise camera parameters; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer effectively mitigates the multi-face Janus problem in text-to-3D generation, outperforming existing methods in both visual quality and consistency.