Open sourcing 3D avatar reconstruction from a single image
OSA.Overview.mp4
This README focuses on how run the code for more detailed information please read the report
The Phorhum paper from Google showed astonishing results in reconstructing 3D avatars from a single image. This work tries to implement their proposed architecture with TensorFlow in an open source fashion. It features a dataset creator, the reimplemented network architecture, a trained model, and a point cloud viewer. While the results are far from the one Google showed, this could be used as a starting point to build upon. This was part of a two-month research internship at the university of tΓΌbingen, the human-computer interaction department.
To solve the task of predicting the surface of a single image, a image dataset containing a human and a signed distance field point cloud dataset with color and normal information is needed. We built the datasets from scratch since there aren't any ready-to-use datasets available. We use the Microsoft Rocketbox avatar dataset as a starting point. The image dataset was constructed by rendering the avatar models from the avatar dataset within environments that are lit with HDRs. For the SDF dataset we collected 1 Mio. points per avatar in the split into 500k near points sampled on the mesh and 500k far points sampled around the mesh and the unit sphere.
.
βββ dataset
β βββ exporter # Tools for dataset creation
β β βββ ... .py # Python scripts that create the dataset
β β βββ ... .sh # Corrseponding bash scripts that call python scripts
β β
β βββ loader # Tools for dataset loading
β β βββ imageDatasetLoader.py # Image loaded with plotting
β β βββ avatarDatasetLoader.py # Avatar OBJ and SDF loader with plotting
β β
β βββ datasets
β βββ hdrs # HDR dataset
β βββ sdfs # SDF dataset
β βββ images # Image dataset
β βββ avatars # Avatar dataset
β βββ environments # Environment dataset
.
A step by step guide to create all the needed datasets (HDRs, SDFs, images, avatars and environments).
The final datasets sdfs
and images
are needed to train the model.
Contact me if you want to have my dataset (it was to big to upload it to GitHub).
- Download the Rocketbox avatar dataset.
- Copy
Adults
,Children
, andProfessions
intoavatars
- Create avatar OBJ dataset by running the modified Mesh2 library by running `createOBJDataset.sh
- Find SDF dataset of
near
andfar
points of each avatar insdfs
- Download HDRs from Polyheaven and copy them into the
hdrs
directory - Find environments of 3D photogrammetry scanned scenes from Sketchfab and download them
- Preprocess environments by creating a blender scene for each environment with the center of the floor at (0,0,0)
- Store blender scenes in
environments
with a subdirectory for each scene variation - (Optional) change avatar and camera augmentation settings in
avatarImageExporter.py
- Run avatar image dataset creation with
createImageDataset.sh
- Find avatar image dataset in
images
For solving the task of inferring the surface and its color from a single image, we use an end-to-end learnable neural network model, that is inspired by Phorhum.
Given the time and computational constraints of the project, we couldnβt reproduce the full model and used subsets of their implementation.
In the following you find our implementaion with modifications such as the surface projection loss.
In the report we propose an attention lookup that you can find in the previous
directory.
.
βββ network
β βββ customLayer # Custom 3rd party keras layers
β βββ previous # Previous network implementations (including attention)
β βββ tests # Test for losses and custom layers
β βββ featureExtractorNetwork.py # Implementation of the feature extractor network G
β βββ geomertyNetwork.py # Implementation of the geometry network f
β βββ loss.py # Cusotom losses (including surface projection)
β βββ network.py # End to end network with training and inference
.
Alltought training results are far from the results Google provides, the network does learn some kind of 3D avatar structure. Sadly color and detailed geometry can not be reconstructed. By examine the results more closely one could state that there is an issue within the feature extractor network and the network is not able to infer color and geometry information from the images.
.
βββ train
β βββ logs # Tensorboard logs
β β
β βββ models # Previous trained models
β β βββ f # Models for feature extractor network
β β βββ g # Models for geometry network
β β
β βββ train.ipynb # Start and configure training jupyter notebook
β βββ train.py # Start and configure training python script
.
The network can be trained by executing either train.ipynb
or train.py
.
We trained the network on a machine with 45 GiB RAM, 8 CPUs, and an A6000 GPU with 48GiB for roughly 2 hours for about 6200 steps.
For visualization purposes, a custom real-time 3D viewer was built rendering millions of points efficiently and enabling the developer to better identify prediction errors. A client-server architecture was chosen with the server running a Flask application directly interacting with the React Three Fiber client.
.
βββ viewer
β βββ react # React three fiber client
β βββ app.py # Flask server
.
- Choose the correct model in
app.py
- Start the flask server with
flask run
in directoryviewer
- Run react three fiber client by calling
yarn dev
in directoryreact
This project was part of a research internship at the human-computer interaction department by the university of tΓΌbingen. Big thanks to Efe Bozkir for his help and mentorship along the project and Timo Alldieck and his colleagues for his amazing work on Phorhum.