In this work we implement algorithms based on the preprint "Determining whether two datasets cluster similarly without determining the clusters" by Van Eeghem et al. [1].
This work was an initial part of a research project by Maxence Giraud on "higher order clustering" supervized by Remy Boyer.
import dataset_similarity_tensor as dst
# Load 2 datasets V,W
## 1. Using kronecker product
VV = dst.tensorize_kr(V)
WW = dst.tensorize_kr(W)
## 2. Using Third Order moment
VV = dst.tensorize_thirdordermoment(V).reshape(V.shape[1],-1) # We reshape because the principal angle are computed on an unfolded tensor (which becomes a matrix)
WW = dst.tensorize_thirdordermoment(W).reshape(W.shape[1],-1)
## Compute principal angle
angle = dst.principal_angles_tensors(VV,WW)
The algorithms computing the principal angle thus resulting in an output between 0 and π/2, the closest this number is to 0 the more similar are the 2 datasets.
[1] Van Eeghem F., De Lathauwer L. (2020). Determining whether two datasets cluster similarly without determining the clusters.