CSV Fusion is a Python-powered tool designed to efficiently merge + 1000 large CSV format files, handling big data seamlessly while adding metadata for source tracking and ensuring consistent headers. Whether you're dealing with massive datasets or small collections, this tool simplifies your workflow, ensuring accuracy and efficiency.
- Effortless Merging: Combine multiple CSV files from a specified directory into a single consolidated output.
- File Metadata Tracking: Automatically adds a metadata column to track the source file for each row of data.
- Chunk Processing for Large Files: Supports efficient chunk-by-chunk processing to handle large datasets without memory overload.
- Customizable Header Management: Ensures consistent headers across all files, with options for correcting or overriding mismatched headers.
- Python: Core language for processing and automation.
- pandas: High-performance data analysis and manipulation library.
-
Data Preparation: Ideal for preparing datasets for machine learning models or business intelligence tools.
-
File Consolidation: Simplify workflows involving data spread across multiple CSV files.
-
Metadata Management: Enhance data traceability by appending source file information.
Contributions are welcome! To get started:
-
Fork this repository.
-
Create a new branch for your feature or bug fix.
-
Submit a pull request for review.
This project is licensed under the MIT License. See the LICENSE file for details.