GitHub - hypper-team/hypper: Hypergraph-based data mining for binary classification

Hypper is a data-mining Python library for binary classification. It uses hypergraph-based methods to explore datasets for the purpose of undersampling, feature selection and binary classification.

Hypper provides an easy-to-use interface familiar to well-recognized Scikit-Learn API.

The primary goal of this library is to provide a tool for handling datasets consisting of mainly categorical features. Novel hypergraph-based methods proposed in the Hypper library were benchmarked against the alternative solutions and achieved satisfactory results. More details can be found in scientific papers presented in the section below.

Installation

pip install hypper

Local installations

pip install -e .['documentation'] # documentation
pip install -e .['develop'] # development (with testing)
pip install -e .['benchmarking'] # benchmarking scripts
pip install -e .['all'] # install everything

Tutorials:

1. Introduction to data mining with Hypper

Testing

pytest

Important links

Source code - https://github.com/hypper-team/hypper
Documentation - https://hypper-team.github.io/hypper.html

Citation

@ARTICLE{Misiorek2022-ru,
  title     = "Hypergraph-based importance assessment for binary classification
               data",
  author    = "Misiorek, Pawel and Janowski, Szymon",
  abstract  = "AbstractWe present a novel hypergraph-based framework enabling
               an assessment of the importance of binary classification data
               elements. Specifically, we apply the hypergraph model to rate
               data samples' and categorical feature values' relevance to
               classification labels. The proposed Hypergraph-based Importance
               ratings are theoretically grounded on the hypergraph cut
               conductance minimization concept. As a result of using
               hypergraph representation, which is a lossless representation
               from the perspective of higher-order relationships in data, our
               approach allows for more precise exploitation of the information
               on feature and sample coincidences. The solution was tested
               using two scenarios: undersampling for imbalanced classification
               data and feature selection. The experimentation results have
               proven the good quality of the new approach when compared with
               other state-of-the-art and baseline methods for both scenarios
               measured using the average precision evaluation metric.",
  journal   = "Knowl. Inf. Syst.",
  publisher = "Springer Science and Business Media LLC",
  month     =  dec,
  year      =  2022,
  copyright = "https://creativecommons.org/licenses/by/4.0",
  language  = "en"
}

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
data		data
docs		docs
hypper		hypper
logo		logo
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Tutorials:

Testing

Important links

Citation

About

Releases

Languages

License

hypper-team/hypper

Folders and files

Latest commit

History

Repository files navigation

Installation

Tutorials:

Testing

Important links

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages