Skip to content

Latest commit

 

History

History
45 lines (31 loc) · 1.38 KB

README.md

File metadata and controls

45 lines (31 loc) · 1.38 KB

Data-Purifier-Dataset

Data repository for Data-Purifier examples

This repository exists only to provide a convenient target for the datapurifier.load_dataset function to download sample datasets from. Its existence makes it easy to document datapurifier without confusing things by spending time loading and munging data. The datasets may change or be removed at any time if they are no longer useful for the datapurifier documentation. Some of the datasets have also been modifed from their canonical sources.

Data is sourced from kaggle

Get Started

Install the packages

pip install data-purifier
python -m spacy download en_core_web_sm

Load the module

import datapurifier as dp
from datapurifier import Mleda, Nleda, Nlpurifier

print(dp.__version__)

Get the list of the example dataset

print(dp.get_dataset_names()) # to get all dataset names
print(dp.get_text_dataset_names()) # to get all text dataset names

Load an example dataset, pass one of the dataset names from the example list as an argument.

df = dp.load_dataset("womens_clothing_e-commerce_reviews")

Example:

Colab Notebook

Official Documentation: https://cutt.ly/CbFT5Dw

Python Package: https://pypi.org/project/data-purifier/