This is based on work in my Hugging Face repository "HeadsNet".
The Hugging Face repository has the full project including the dataset (I also go into a little more detail), this GitHub repository just stands to hold the bare code.
It all started with this rough idea that I had after spending much time looking into Neural Radiance Fields (NeRF) for generative 3D which can be viewed here: "PT-NePC".
The dataset this was trained on is a synthetic dataset I generated from StyleGAN2 using ThisPersonDoesNotExist.com and then feeding those synthetic 2D images into TripoSR to turn them into 3D heads, the dataset is on Hugging Face here: "FaceTo3D".
The first attempt was my PT-NePC approach in "headsnet". HeadsNet was the highest quality attempt. It took a simple two vector input to produce a random full color 3D point cloud of a head. It includes the scraper, a viewer for the scraped models, the dataset generator, training and prediction code.
The second attempt was to simplify the problem down to producing a 32^3 grayscale voxel volume of a head from a 32x32 grayscale input image.
- facenet1 has the dataset generation code and the first attempt at FaceToVoxel. It attempts to train one large FNN/MLP on the problem.
- facenet2 requires the dataset generated by facenet1, and attempts to train the problem on 32^3 individual networks with a single output (the grayscale value for a single voxel). This allows better parallelisation such as over multiple machines in a network - but also better parallelisation over multiple CPU cores in a single computer system.
- facenet3 the successor model, a simplified version of facenet1.
This project deliberately focuses on MLP's while ignoring VAE's which would be a more traditional use case.
Training was done on a single HPE ProLiant DL580 Gen9 with Intel® Xeon® E7-8880 v4. Although I could have done with a few of these for facenet2 to be honest! 32 of them would reduce the training process from a week or multiple weeks to just a few hours or days. Being able to perform faster tests allows one to hone in on a working and quality model much faster (it's hard to say if there would have been a better quality model with more processing power, I would assume not but who knows until it is actually attempted).
An example of ground truth outputs that facenet is trained on is facenet_ground_truth.7z.