A Julia package for Cluster Validity Indices (CVI) algorithms.
Documentation | Build Status | Coverage | Reference |
---|---|---|---|
Documentation Build | JuliaHub Status | Dependents | Release |
Please read the documentation for detailed usage and tutorials.
Cluster Validity Indices (CVIs) are designed to be metrics of performance for unsupervised clustering algorithms. In the absense of supervisory labels (i.e., ground truth), clustering algorithms - or any truly unsupervised learning algorithms - have no way to definitively know the stability of their learning and accuracy of their performance. As a result, CVIs exist to provide metrics of partitioning stability/validity through the use of only the original data samples and the cluster labels prescribed by the clustering algorithm.
This Julia package contains an outline of the conceptual usage of CVIs along with many example scripts in the documentation. This outline begins with a list of CVIs that are implemented in the lastest version of the project. Quickstart provides an overview of how to use this project, while Structure outlines the project file structure, giving context to the locations of every component of the project. Usage outlines the general syntax and workflow of the CVIs/ICVIs.
This project is distributed as a Julia package and hosted on JuliaHub, Julia's package manager repository. As such, this package's usage follows the usual Julia package installation procedure, interactively:
julia> ]
(@v1.8) pkg> add ClusterValidityIndices
or programmatically:
julia> using Pkg
julia> Pkg.add("ClusterValidityIndices")
You may also add the package directly from GitHub to get the latest changes between releases:
julia> ]
(@v1.8) pkg> add https://github.com/AP6YC/ClusterValidityIndices.jl
This section provides a quick overview of how to use the project. For more detailed code usage, please see the [Detailed Usage](@ref usage).
First, import the package with:
# Import the package
using ClusterValidityIndices
CVI objects are instantiated with empty constructors:
# Create a Davies-Bouldin (DB) CVI object
my_cvi = DB()
All CVIs are implemented with acronyms of their literature names. A list of all of these are found in the Implemented CVIs/ICVIs section.
Next, get data from a clustering process. This is a set of samples of features that are clustered and prescribed cluster labels.
Note
The
ClusterValidityIndices.jl
package assumes data to be in the form of Float matrices where columns are samples and rows are features. An individual sample is a single vector of features. Labels are vectors of integers where each number corresponds to its own cluster.
# Random data as an example; 10 samples with feature dimenison 3
dim = 3
n_samples = 10
data = rand(dim, n_samples)
labels = repeat(1:2, inner=n_samples)
The output of CVIs are called criterion values, and they can be computed both incrementally and in batch with get_cvi!
.
Compute in batch by providing a matrix of samples and a vector of labels:
criterion_value = get_cvi!(my_cvi, data, labels)
or incrementally with the same function by passing one sample and label at a time:
# Create a fresh CVI object for incremental evaluation
my_icvi = DB()
# Create a container for the values and iterate
criterion_values = zeros(n_samples)
for i = 1:n_samples
criterion_values[i] = get_cvi!(my_icvi, data[:, i], labels[i])
end
Note
Each module has a batch and incremental implementation, but
ClusterValidityIndices.jl
does not yet support switching between batch and incremental modes with the same CVI object.
This project has implementations of the following CVIs in both batch and incremental variants:
CH
: Calinski-Harabasz.cSIL
: Centroid-based Silhouette.DB
: Davies-Bouldin.GD43
: Generalized Dunn's Index 43.GD53
: Generalized Dunn's Index 53.PS
: Partition Separation.rCIP
: (Renyi's) representative Cross Information Potential.WB
: WB-index.XB
: Xie-Beni.
The exported constant CVI_MODULES
also contains a list of these CVIs for convenient iteration.
A basic example of the package usage is found in the documentation illustrating top-down usage of the package.
Futhermore, there are a variety of examples in the Examples section of the documentation for a variety of use cases of the project.
Each of these is made using the DemoCards.jl
package and can be opened, saved, and run as a Julia notebook.
If you have a question or concern, please raise an issue. For more details on how to work with the project, propose changes, or even contribute code, please see the Developer Notes in the project's documentation.
In summary:
- Questions and requested changes should all be made in the issues page. These are preferred because they are publicly viewable and could assist or educate others with similar issues or questions.
- For changes, this project accepts pull requests (PRs) from
feature/<my-feature>
branches onto thedevelop
branch using the GitFlow methodology. If unit tests pass and the changes are beneficial, these PRs are merged intodevelop
and eventually folded into versioned releases. - The project follows the Semantic Versioning convention of
major.minor.patch
incremental versioning numbers. Patch versions are for bug fixes, minor versions are for backward-compatible changes, and major versions are for new and incompatible usage changes.
This package is developed and maintained by Sasha Petrenko with sponsorship by the Applied Computational Intelligence Laboratory (ACIL). This project is supported by grants from the Night Vision Electronic Sensors Directorate, the DARPA Lifelong Learning Machines (L2M) program, Teledyne Technologies, and the National Science Foundation. The material, findings, and conclusions here do not necessarily reflect the views of these entities.
The users @rMassimiliano and @malmaud have graciously contributed their time with reviews and feedback that has greatly improved the project.
This software is openly maintained by the ACIL of the Missouri University of Science and Technology under the MIT License.
This project has a citation file file that generates citation information for the package and corresponding JOSS paper, which can be accessed at the "Cite this repository button" under the "About" section of the GitHub page.
You may also cite this repository with the following BibTeX entry:
@article{Petrenko2022,
doi = {10.21105/joss.03527},
url = {https://doi.org/10.21105/joss.03527},
year = {2022},
publisher = {The Open Journal},
volume = {7},
number = {79},
pages = {3527},
author = {Sasha Petrenko and Donald C. Wunsch},
title = {ClusterValidityIndices.jl: Batch and Incremental Metrics for Unsupervised Learning},
journal = {Journal of Open Source Software}
}