Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
尽管近期在高保真人体重建技术方面取得了进展,但密集捕获图像的要求或耗时的单例优化显著限制了它们在更广泛场景中的应用。为了解决这些问题,我们提出了HumanSplat,它可以从单一输入图像以泛化的方式预测任何人的三维高斯喷洒属性。特别是,HumanSplat包括一个2D多视图扩散模型和一个带有人类结构先验的潜在重建变换器,这些先验巧妙地将几何先验和语义特征整合在一个统一框架中。此外,设计了一个包含人类语义信息的层次化损失,以实现高保真纹理建模并更好地约束估计的多视图。在标准基准和实际场景图像上的全面实验表明,HumanSplat在实现逼真的新视角合成方面超越了现有的最先进方法。