This is python TensorFlow implementation of Dynamic Routing Between Capsules
python3.6
pip install -r requirment.txt
python main.py --mode=train --model=cap
-
Keras:
- XifengGuo/CapsNet-Keras I referred to some functions in this repository.
-
TensorFlow:
- naturomics/CapsNet-Tensorflow
XifengGuo referred to some functions in this repository. - InnerPeace-Wu/CapsNet-tensorflow
- chrislybaer/capsules-tensorflow
- naturomics/CapsNet-Tensorflow
-
PyTorch:
-
MXNet:
-
Chainer:
-
Matlab:
See training result:
tensorboard --logdir=train_log/ --host=0.0.0.0 --port=8080
tensorboard --logdir=test_log/ --host=0.0.0.0 --port=6060
Training result show between CapsNet (Orange) and CNN baseline (Blue). The cost of CapsNet is marginal loss plus l2 regularization. The cost of CNN baseline is the sum of cross entropy and l2 loss. Notice the CE loss is more sensitive than the marginal loss. Tensor neuron (aka. Capsule)'s loss function is more stable (?), it also support existence of multiple classes (<-one of the purpose of this paper). The CapsNet trains 3 times faster than the CNN baseline, partially due to a simpler implementation that takes advantage of TensorFlow reshape mechanism.
Figures below show a side by side comparison between CapsNet + Recon (Red) and CapsNet (Orange).
TBD
According to Paper:
One very special property is the existence of the instantiated entity in the image. An obvious way to represent existence is by using a separate logistic unit whose output is the probability that the entity exists. In this paper we explore an interesting alternative which is to use the overall length of the vector of instantiation parameters to represent the existence of the entity and to force the orientation of the vector to represent the properties of the entity1. We ensure that the length of the vector output of a capsule cannot exceed 1 by applying a non-linearity that leaves the orientation of the vector unchanged but scales down its magnitude.
As the follow up paper--MATRIX CAPSULES WITH EM ROUTING--states, the CapsNet has the following defects:
- It uses the length of the pose vector to represent the probability that the entity represented by a capsule is present. To keep the length less than 1 requires an unprincipled non-linearity that prevents there from being any sensible objective function that is minimized by the iterative routing procedure.
- It uses the cosine of the angle between two pose vectors to measure their agreement. Unlike the log variance of a Gaussian cluster, the cosine is not good at distinguishing between quite good agreement and very good agreement.
- It uses a vector of length n rather than a matrix with n elements to represent a pose, so its transformation matrices have n2 parameters rather than just n.
TBD
Kendrick Tan :(English) Capsule Networks Explained
SIY.Z: (Chinese) 如何看待Hinton的论文《Dynamic Routing Between Capsules》? - SIY.Z的回答 - 知乎