- Published on
ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation
- Authors
- Name
- Yuan Zhou
- Name
- Shilong Jin
- Name
- Litao Hua
- Name
- Wanjun Lv
- Name
- Haoran Duan
- Name
- Jungong Han
- Affiliation
- School of Artificial Intelligence, Nanjing University of Information Science and Technology, Jiangsu 210044, China
- Affiliation
- Lenovo, Beijing, China
- Affiliation
- Department of Automation, Tsinghua University, Beijing, China
Recent advances in zero-shot text-to-3D generation have revolutionized 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent view biases in T2I priors. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. To address this fundamental challenge, we propose ConsDreamer, a novel framework that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise camera parameters; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer effectively mitigates the multi-face Janus problem in text-to-3D generation, outperforming existing methods in both visual quality and consistency.