-
Updated
Jan 7, 2023 - Python
data-deduplication
Here are 15 public repositories matching this topic...
Fellow is a package for creating people that can be unified by their shared values via a singleton list on the class
-
Updated
Jun 16, 2024 - TypeScript
Practical backups. The Unix toolkit way.
-
Updated
Jan 14, 2018 - Shell
General deduping engine for JDBC sources with output to JDBC/csv targets
-
Updated
Dec 21, 2020 - Kotlin
This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.
-
Updated
Jun 17, 2024
A calculator for storage and transmission of deduplicated data presentation in charts and tables
-
Updated
Sep 26, 2023
This is a server client architecture based data deduplication implementation
-
Updated
May 14, 2019 - C++
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
-
Updated
Aug 19, 2020 - Scala
PolyDeDupe: Multi-Lingual Data Deduplication
-
Updated
Sep 16, 2024 - Python
A JAVA project that splits data using hashing techniques and removes duplicate blocks to save cloud storage. This project also uses the CloudSim framework for cloud storage simulation.
-
Updated
Jan 6, 2021 - Java
A Python-based tool for preprocessing, cleaning, and analyzing text datasets, designed to filter, deduplicate, sort data, and generate statistical insights.
-
Updated
Sep 16, 2024 - Python
Self-contained C# library for data deduplication using Sqlite
-
Updated
Apr 7, 2023 - C#
Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.
-
Updated
Sep 21, 2023 - Java
🚢 Data Toolkit for Sailor Language Models
-
Updated
Jul 11, 2024 - Python
Data deduplication engine, supporting optional compression and public key encryption.
-
Updated
Aug 25, 2022 - Rust
Improve this page
Add a description, image, and links to the data-deduplication topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-deduplication topic, visit your repo's landing page and select "manage topics."