DB2IXF Parser

DB2IXF parser is an open-source python package that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files. IXF is a file format used by IBM's DB2 database system for data import and export operations. This package provides a streamlined solution for extracting data from IXF files and converting it to various formats, including JSON, JSONLINE, CSV, Parquet and Deltalake.

Features

Parse IXF files: The package allows you to parse IXF files and extract the rows of data stored within them.
Convert to multiple formats: The parsed data can be easily converted to JSON, JSONLINE, CSV, Parquet, or Deltalake format, providing flexibility for further analysis and integration with other systems.
Support for file-like objects: IXF Parser supports file-like objects as input, enabling direct parsing of IXF data from file objects, making it convenient for handling large datasets without the need for intermediate file storage.
Minimal dependencies: The package has few dependencies (ebcdic, pyarrow, deltalake, chardet, typer) which are automatically installed alongside the package.
CLI: command line tool called db2ixf comes with the package. (Does not support Deltalake format)

Hypothesis

1O1: One IXF file contains One table.

Getting Started

Installation

You can install DB2 IXF Parser using pip:

pip install db2ixf

Usage

Here are some examples of how to use DB2 IXF Parser:

CLI

Start with this:

db2ixf --help

Result:

 Usage: db2ixf [OPTIONS] COMMAND [ARGS]...

 A command-line tool (CLI) for parsing and converting IXF (IBM DB2 
 Import/Export Format) files to various formats such as JSON, JSONLINE, CSV and 
 Parquet. Easily parse and convert IXF files to meet your data processing needs.

+- Options -------------------------------------------------------------------+
| --version             -v        Show the version of the CLI.                |
| --install-completion            Install completion for the current shell.   |
| --show-completion               Show completion for the current shell, to   |
|                                 copy it or customize the installation.      |
| --help                          Show this message and exit.                 |
+-----------------------------------------------------------------------------+
+- Commands ------------------------------------------------------------------+
| csv      Parse ixf FILE and convert it to a csv OUTPUT.                     |
| json     Parse ixf FILE and convert it to a json OUTPUT.                    |
| jsonline     Parse ixf FILE and convert it to a jsonline OUTPUT.            |
| parquet  Parse ixf FILE and convert it to a parquet OUTPUT.                 |
+-----------------------------------------------------------------------------+

 Made with heart :D

Parsing an IXF file

# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser

path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
    parser = IXFParser(f)
    # rows = parser.parse()  # Deprecated !
    rows = parser.get_row()  # Python generator
    for row in rows:
        print(row)

with open(path, mode='rb') as f:
    parser = IXFParser(f)
    rows = parser.get_all_rows()  # Loads into memory !
    for row in rows:
        print(row)

Converting to JSON

# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser

path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
    parser = IXFParser(f)
    output_path = Path('path/to/output/file.json')
    with open(output_path, mode='w', encoding='utf-8') as output_file:
        parser.to_json(output_file)

Converting to JSONLINE

# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser

path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
    parser = IXFParser(f)
    output_path = Path('path/to/output/file.jsonl')
    with open(output_path, mode='w', encoding='utf-8') as output_file:
        parser.to_jsonline(output_file)

Converting to CSV

# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser

path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
    parser = IXFParser(f)
    output_path = Path('path/to/output/file.csv')
    with open(output_path, mode='w', encoding='utf-8') as output_file:
        parser.to_csv(output_file)

Converting to Parquet

# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser

path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
    parser = IXFParser(f)
    output_path = Path('path/to/output/file.parquet')
    with open(output_path, mode='wb') as output_file:
        parser.to_parquet(output_file)

Converting to Deltalake

# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser

path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
    parser = IXFParser(f)
    output_path = 'path/to/output/'
    parser.to_deltalake(output_path)

For a detailed story and usage, please refer to the documentation.

Precautions

There are cases where the parsing can fail and sometimes can lead to data loss:

Completely corrupted ixf file: It is usually an extraction issue.
Partially corrupted ixf file, it contains some corrupted Rows/Lines that the parser can not parse.
1. Parser calculates rate of corrupted rows then compares it to an accepted rate of corrupted rows which you can set by this environment variable DB2IXF_ACCEPTED_CORRUPTION_RATE(int = 1)%.
2. If the rate of corrupted rows is bigger than the accepted rate the parser raises an exception.
Unsupported data type : please contact the owners/maintainers/contributors so you can get help otherwise any PR is welcomed.

Case: encoding issues

Parsing can lead to data loss in case the found or the detected encoding is 
not able to decode some extracted fields/columns. 

Parser tries to decode using:
    
    1. The found encoding (found in the column record)
    
    2. Other encodings like cp437
  
    3. The detected encoding using a third party package (chardet)
  
    4. Encodings like utf-8 and utf-32
  
    5. Ignore errors which can lead to data loss !

Before using the package in production, try to test in debug mode so you can
detect data loss.

Contributing

IXF Parser is actively seeking contributions to enhance its features and reliability. Your participation is valuable in shaping the future of the project.

We appreciate your feedback, bug reports, and feature requests. If you encounter any issues or have ideas for improvement, please open an issue on the GitHub repository.

For any questions or assistance during the contribution process, feel free to reach out by opening an issue on the GitHub repository.

Thank you for considering contributing to IXF Parser. Let's work together to create a powerful and dependable tool for working with DB2's IXF files.

Todo

License

IXF Parser is released under the AGPL-3.0 License.

Support

If you encounter any issues or have questions about using IXF Parser, please open an issue on the GitHub repository. We will do our best to address them promptly.

Conclusion

IXF Parser offers a convenient solution for parsing and processing IBM DB2's IXF files. With its ease of use and support for various output formats, it provides a valuable tool for working with DB2 data. We hope that you find this package useful in your data analysis and integration workflows.

Give it a try and let us know your feedback. Happy parsing!

Name		Name	Last commit message	Last commit date
Latest commit History 275 Commits
.github/workflows		.github/workflows
docs		docs
resources		resources
src/db2ixf		src/db2ixf
tests		tests
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
constraints.txt		constraints.txt
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DB2IXF Parser

Features

Hypothesis

Getting Started

Installation

Usage

CLI

Parsing an IXF file

Converting to JSON

Converting to JSONLINE

Converting to CSV

Converting to Parquet

Converting to Deltalake

Precautions

Case: encoding issues

Contributing

Todo

License

Support

Conclusion

About

Releases 36

Packages

Languages

License

ismailhammounou/db2ixf

Folders and files

Latest commit

History

Repository files navigation

DB2IXF Parser

Features

Hypothesis

Getting Started

Installation

Usage

CLI

Parsing an IXF file

Converting to JSON

Converting to JSONLINE

Converting to CSV

Converting to Parquet

Converting to Deltalake

Precautions

Case: encoding issues

Contributing

Todo

License

Support

Conclusion

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 36

Packages 0

Languages

Packages