From 59a6817d5c865f8a7228d852726f5c6c62de6321 Mon Sep 17 00:00:00 2001 From: "Martindale, Nathan" Date: Tue, 15 Aug 2023 11:46:46 -0400 Subject: [PATCH] Update README --- README.md | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f84307e..6ea263a 100644 --- a/README.md +++ b/README.md @@ -16,11 +16,42 @@ [![PyPI version](https://badge.fury.io/py/icat-iml.svg)](https://badge.fury.io/py/icat-iml) [![tests](https://github.com/ORNL/icat/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/ORNL/icat/actions/workflows/tests.yml) -The Interactive Corpus Analysis Tool (ICAT) is a program that can be used to explore textual corpora and train intelligent filters to identify specific documents of interest. +The Interactive Corpus Analysis Tool (ICAT) is an interactive machine learning (IML) dashboard for unlabeled text datasets that allows a user to iteratively and visually define features, explore and label instances of their dataset, and train a logistic regression model on the fly as they do so to assist in filtering, searching, and labeling tasks. + + + +ICAT is implemented using holoviz's [panel](https://panel.holoviz.org/) library, so it can either directly be rendered like a widget in a jupyter lab/notebook instance, or incorporated as part of a standalone panel website. + ## Installation +ICAT can be installed via `pip` with: + ``` pip install icat-iml ``` + + + +## Visualization + +The primary ring visualization is called AnchorViz, a technique from IML literature. (See Chen, Nan-Chen, et al. "[AnchorViz: Facilitating classifier error discovery through interactive semantic data exploration](https://dl.acm.org/doi/abs/10.1145/3172944.3172950)") + +We implemented an ipywidget version of this visualization and use it in this project, it can be found separately at [https://github.com/ORNL/ipyanchorviz](https://github.com/ORNL/ipyanchorviz) + + + +## Citation + +To cite usage of ICAT, please use the following bibtex: + +```bibtex +@misc{doecode_105653, + title = {Interactive Corpus Analysis Tool}, + author = {Martindale, Nathan and Stewart, Scott}, + abstractNote = {The Interactive Corpus Analysis Tool (ICAT) is an interactive machine learning dashboard for unlabeled text/natural language processing datasets that allows a user to iteratively and visually define features, explore and label instances of their dataset, and simultaneously train a logistic regression model. ICAT was created to allow subject matter experts in a specific domain to directly train their own models for unlabeled datasets visually, without needing to be a machine learning expert or needing to know how to code the models themselves. This approach allows users to directly leverage the power of machine learning, but critically, also involves the user in the development of the machine learning model.}, + year = {2023}, + month = {apr} +} +```