Uniform Manifold Approximation with Two-phase Optimization (UMATO) is a dimensionality reduction technique, which can preserve the global as well as the local structure of high-dimensional data. Most existing dimensionality reduction algorithms focus on either of the two aspects, however, such insufficiency can lead to overlooking or misinterpreting important global patterns in the data. Moreover, the existing algorithms suffer from instability. To address these issues, UMATO proposes a two-phase optimization: global optimization and local optimization. First, we obtain the global structure by selecting and optimizing the hub points. Next, we initialize and optimize other points using the nearest neighbor graph. Our experiments with one synthetic and three real world datasets show that UMATO can outperform the baseline algorithms, such as PCA, t-SNE, Isomap, UMAP, LAMP and PacMAP, in terms of accuracy, stability, and scalability.
- Python 3.9 or greater
- scikit-learn
- numpy
- scipy
- numba
- pandas (to read csv files)
UMATO is available via pip.
pip install umato
import umato
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
emb = umato.UMATO(hub_num=50).fit_transform(X)
For detailed information on the algorithm and parameter usage, check the API listed under Wiki.
Detailed statistical data supporting our findings in accuracy and scalability analyses are presented in the figures below:
The average scores that nine DR techniques obtain in the accuracy analysis. For each quality metric, DR techniques ranked in the first--fourth place are highlighted in blue, where we assign higher opacity to the better techniques. Similarly, techniques ranked in the six--ninth place are highlighted in red, where worse techniques have higher opacity. UMATO substantially outperforms baselines in terms of global metrics with a slight sacrifice in local metric scores. Note that we standardize both the original data and projections to minimize the impact of scaling.
The subset of the projections generated in our accuracy analysis. Colors depict the class label of each dataset. The analysis results verified that UMATO outperforms competitors in terms of accurately preserving global structure while maintaining competitive performance in depicting local structure.
UMATO can be cited as follows:
@inproceedings{jeon2022vis,
title={Uniform Manifold Approximation with Two-phase Optimization},
author={Jeon, Hyeon and Ko, Hyung-Kwon and Lee, Soohyun and Jo, Jaemin and Seo, Jinwook},
booktitle={2022 IEEE Visualization and Visual Analytics (VIS)},
pages={80--84},
year={2022},
organization={IEEE}
}
Jeon, H., Ko, H. K., Lee, S., Jo, J., & Seo, J. (2022, October). Uniform Manifold Approximation with Two-phase Optimization. In 2022 IEEE Visualization and Visual Analytics (VIS) (pp. 80-84). IEEE.