NeRF stands for Neural Radiance Fields. It solves for view interpolation, which is taking a set of input views (in this case a sparse set) and synthesizing novel views of the same scene. Current RGB volume rendering models are great for optimization, but require extensive storage space (1-10GB). One side benefit of NeRF is the weights generated from the neural network are $\sim$6000 less in size than the original images.
Rasterization: Computer graphics use this technique to display a 3D
object on a 2D screen. Objects on the screen are created from virtual
triangles/polygons to create 3D models of the objects. Computers convert
these triangles into pixels, which are assigned a color. Overall, this
is a computationally intensive process.
Ray Tracing: In the real world, the 3D objects we see are
illuminated by light. Light may be blocked, reflected, or refracted. Ray
tracing captures those effects. It is also computationally intensive,
but creates more realistic effects. Ray: A ray is a line connected
from the camera center, determined by camera position parameters, in a
particular direction determined by the camera angle.
NeRF uses ray tracing rather than rasterization for its models.
Neural Rendering As of 2020/2021, this terminology is used when a
neural network is a black box that models the geometry of the world and
a graphics engine renders it. Other terms commonly used are scene
representations, and less frequently, implicit representations. In
this case, the neural network is just a flexible function approximator
and the rendering machine does not learn at all.
A continuous scene is represented as a 3D location x = (x, y, z) and
2D viewing direction
This neural network is wrapped into volumetric ray tracing where you
start with the back of the ray (furthest from you) and walk closer to
you, querying the color and density. The equation for expected color
To actually calculate this, the authors used a stratified sampling
approach where they partition
Where
In practice, the Cartesian coordinates are expressed as vector d. You
can approximate this representation through MLP with
Why does NeRF use MLP rather than CNN? Multilayer perceptron (MLP)
is a feed forward neural network. The model doesn’t need to conserve
every feature, therefore a CNN is not necessary.\
The naive implementation of a neural radiance field creates blurry
results. To fix this, the 5D coordinates are transformed into positional
encoding (terminology borrowed from transformer literature).
L determines how many levels there are in the positional encoding and it is used for regularizing NeRF (low L = smooth). This is also known as a Fourier feature, and it turns your MLP into an interpolation tool. Another way of looking at this is your Fourier feature based neural network is just a tiny look up table with extremely high resolution. Here is an example of applying Fourier feature to your code:\
B = SCALE * np.random.normal(shape = (input_dims, NUM_FEATURES))
x = np.concatenate([np.sin(x @ B), np.cos(x @ B)], axis = -1)
x = nn.Dense(x, features = 256)
NeRF also uses hierarchical volume sampling: coarse sampling and the
fine network. This allows NeRF to more efficiently run their model and
deprioritize areas of the camera ray where there is free space and
occlusion. The coarse network uses
$$\hat{C}c(r) = \sum{i=1}^{N_{c}}w_{i}c_{i}, w_{i}=T_{i}(1-exp(-\sigma_{i}\delta_{i}))$$
A second set of
The paper goes in depth on quantitative measures of the results, which NeRF outperforms existing models. A visual assessment is shared below:
What’s the difference between ray tracing and
rasterization?
Self explanatory title, excellent write-up helping reader differentiate
between two concepts.
Matthew Tancik NeRF ECCV 2020 Oral
Videos showcasing NeRF produced images.
NeRF: Representing Scenes as Neural Radiance Fields for View
Synthesis
Simple and alternative explanation for NeRF.
NeRF: Representing Scenes as Neural Radiance Fields for View
Synthesis arxiv paper
CS 231n Spring 2021 Jon Barron Guest
Lecture