Loci is an unsupervised disentangled LOCation and Identity tracking system, which excels on the CATER and related object tracking challenges featuring emergent object permanence and stable entity disentanglement via fully unsupervised learning.
Paper: "Learning What and Where - Unsupervised Disentangling Location and Identity Tracking" | arXiv
CATER.Snitch.Tracking.Challenge.mp4
A suitable conda environment named loci
can be created
and activated with:
conda env create -f environment.yaml
conda activate loci
A preprocessed CATER dataset together with the 5 trained networks from the paper can be found here
The dataset folder (CATER) needs to be copied to data/data/video/
Loci-Latent-GUI.mp4
We provide an interactive GUI to explore the learned representations of the model. The GUI can load the extracted latent state for one slot. In the top left grid the bits of the gestalt code can be flipped, while in the top right image the position can be changed (by clicking or scrolling). The Bottom half of the GUI shows the composition of the background with the reconstructed slot content as well as the entity's RGB repressentation and mask.
Run the GUI (extracted latent states can be found here):
python -m model.scripts.playground -cfg model/cater.json \
-background data/data/video/CATER/background.jpg -load net2.pt \
-latent latent-states/net2/latent-0000-07.pickle
Training can be started with:
python -m model.main -train -cfg model/cater-stage1.json
A trained model can be evaluated with:
python -m model.main -eval -testset -cfg model/cater.json -load net1.pt
Images and latent states can be generated using:
python -m model.main -save -testset -cfg model/cater.json -load net1.pt