Skip to content

Latest commit

Β 

History

History
168 lines (124 loc) Β· 7.79 KB

README.md

File metadata and controls

168 lines (124 loc) Β· 7.79 KB

OSA

Open sourcing 3D avatar reconstruction from a single image

OSA.Overview.mp4

This README focuses on how run the code for more detailed information please read the report

The Phorhum paper from Google showed astonishing results in reconstructing 3D avatars from a single image. This work tries to implement their proposed architecture with TensorFlow in an open source fashion. It features a dataset creator, the reimplemented network architecture, a trained model, and a point cloud viewer. While the results are far from the one Google showed, this could be used as a starting point to build upon. This was part of a two-month research internship at the university of tΓΌbingen, the human-computer interaction department.

Dataset

imageDataset

To solve the task of predicting the surface of a single image, a image dataset containing a human and a signed distance field point cloud dataset with color and normal information is needed. We built the datasets from scratch since there aren't any ready-to-use datasets available. We use the Microsoft Rocketbox avatar dataset as a starting point. The image dataset was constructed by rendering the avatar models from the avatar dataset within environments that are lit with HDRs. For the SDF dataset we collected 1 Mio. points per avatar in the split into 500k near points sampled on the mesh and 500k far points sampled around the mesh and the unit sphere.

Dataset Content

.
β”œβ”€β”€ dataset
β”‚   β”œβ”€β”€ exporter                     # Tools for dataset creation
β”‚   β”‚   β”œβ”€β”€ ... .py                  # Python scripts that create the dataset
β”‚   β”‚   └── ... .sh                  # Corrseponding bash scripts that call python scripts
β”‚   β”‚
β”‚   β”œβ”€β”€ loader                       # Tools for dataset loading
β”‚   β”‚   β”œβ”€β”€ imageDatasetLoader.py    # Image loaded with plotting
β”‚   β”‚   └── avatarDatasetLoader.py   # Avatar OBJ and SDF loader with plotting
β”‚   β”‚
β”‚   └── datasets
β”‚       β”œβ”€β”€ hdrs                     # HDR dataset
β”‚       β”œβ”€β”€ sdfs                     # SDF dataset
β”‚       β”œβ”€β”€ images                   # Image dataset
β”‚       β”œβ”€β”€ avatars                  # Avatar dataset
β”‚       └── environments             # Environment dataset
.

Built Dataset

A step by step guide to create all the needed datasets (HDRs, SDFs, images, avatars and environments). The final datasets sdfs and images are needed to train the model. Contact me if you want to have my dataset (it was to big to upload it to GitHub).

General

  • Download the Rocketbox avatar dataset.
  • Copy Adults, Children, and Professions into avatars

SDF Dataset

  • Create avatar OBJ dataset by running the modified Mesh2 library by running `createOBJDataset.sh
  • Find SDF dataset of near and far points of each avatar in sdfs

Image Dataset

  • Download HDRs from Polyheaven and copy them into the hdrs directory
  • Find environments of 3D photogrammetry scanned scenes from Sketchfab and download them
  • Preprocess environments by creating a blender scene for each environment with the center of the floor at (0,0,0)
  • Store blender scenes in environments with a subdirectory for each scene variation
  • (Optional) change avatar and camera augmentation settings in avatarImageExporter.py
  • Run avatar image dataset creation with createImageDataset.sh
  • Find avatar image dataset in images

Network

architecture

For solving the task of inferring the surface and its color from a single image, we use an end-to-end learnable neural network model, that is inspired by Phorhum. Given the time and computational constraints of the project, we couldn’t reproduce the full model and used subsets of their implementation. In the following you find our implementaion with modifications such as the surface projection loss. In the report we propose an attention lookup that you can find in the previous directory.

Network Content

.
β”œβ”€β”€ network
β”‚   β”œβ”€β”€ customLayer                  # Custom 3rd party keras layers
β”‚   β”œβ”€β”€ previous                     # Previous network implementations (including attention)
β”‚   β”œβ”€β”€ tests                        # Test for losses and custom layers
β”‚   β”œβ”€β”€ featureExtractorNetwork.py   # Implementation of the feature extractor network G
β”‚   β”œβ”€β”€ geomertyNetwork.py           # Implementation of the geometry network f
β”‚   β”œβ”€β”€ loss.py                      # Cusotom losses (including surface projection)
β”‚   └── network.py                   # End to end network with training and inference
.

Train

random_points

Alltought training results are far from the results Google provides, the network does learn some kind of 3D avatar structure. Sadly color and detailed geometry can not be reconstructed. By examine the results more closely one could state that there is an issue within the feature extractor network and the network is not able to infer color and geometry information from the images.

Train Content

.
β”œβ”€β”€ train
β”‚   β”œβ”€β”€ logs                         # Tensorboard logs
β”‚   β”‚
β”‚   β”œβ”€β”€ models                       # Previous trained models
β”‚   β”‚   β”œβ”€β”€ f                        # Models for feature extractor network
β”‚   β”‚   └── g                        # Models for geometry network
β”‚   β”‚
β”‚   β”œβ”€β”€ train.ipynb                  # Start and configure training jupyter notebook
β”‚   └── train.py                     # Start and configure training python script
.

Train Network

The network can be trained by executing either train.ipynb or train.py. We trained the network on a machine with 45 GiB RAM, 8 CPUs, and an A6000 GPU with 48GiB for roughly 2 hours for about 6200 steps.

Viewer

viewer

For visualization purposes, a custom real-time 3D viewer was built rendering millions of points efficiently and enabling the developer to better identify prediction errors. A client-server architecture was chosen with the server running a Flask application directly interacting with the React Three Fiber client.

Viewer Content

.
β”œβ”€β”€ viewer
β”‚   β”œβ”€β”€ react                         # React three fiber client
β”‚   └── app.py                        # Flask server
.

Run Viewer

  • Choose the correct model in app.py
  • Start the flask server with flask run in directory viewer
  • Run react three fiber client by calling yarn dev in directory react

Misc

This project was part of a research internship at the human-computer interaction department by the university of tΓΌbingen. Big thanks to Efe Bozkir for his help and mentorship along the project and Timo Alldieck and his colleagues for his amazing work on Phorhum.