Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.94 KB

README.md

File metadata and controls

33 lines (24 loc) · 1.94 KB

Panda-monium

Inactively Maintained

Panda-monium lets you serialize + compress Pandas DataFrames. It uses CSVs to serialize and Goose to compress DataFrames. So far, the only way to serialize DataFrames was to use pickle (which takes lots of space on your computer) and converting to CSV files (which can create the annoying Unnamed: 0 column)

Tutorial

Example:

import pandas as pd
import Pandamonium as pm #Keep the uppercase
data = pd.DataFrame({ ... }) #Add your data
file = "data.pdc"

# Compress
pm.compress(data, file) #Should return "Success!"

# Decompress
loaded = pm.decompress(file)

How it works

Panda-monium works by converting DataFrames into CSV files and replacing substrings (such as a comma next to a number) into 1 character. The larger the data, the more likely it is for compression to work. It removes the annoying Unnamed: 0 column by removing it during decompression.

Collisions

A collision is when one of the DataFrame's strings contains a Panda-monium "keyword" or the replacing character during compression (Panda-monium has been designed to prevent collisions by replacing a comma next to any number with a character that isn't on the keyboard. This makes it unlikely for collisions to happen).

Collisions can cause the DataFrame to become unstable and have weird dimensions, which can cause errors.

Preventing Collisions

Collisions can be prevented by using characters that the system can't show in the message (such as �).

However, there is a small chance that the symbol (that GitHub can't display) gets in the DataFrame. The solution? Escape characters. They will be added in future versions.