The oor_mapper.py python script converts the Open Ownership Register to json files ready to load into Senzing.
The Open Ownership download page is here. Just select the latest date and download the file. It should be named something like "statements.yyyy-mm-ddThh_mm_ssZ.jsonl.gz"
Usage:
python oor_mapper.py --help
usage: oor_mapper.py [-h] [-i INPUT_PATH] [-o OUTPUT_FILE] [-l LOG_FILE]
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input_file INPUT_FILE
the name of the input file
-o OUTPUT_FILE, --output_file OUTPUT_FILE
the name of the output file
-l LOG_FILE, --log_file LOG_FILE
optional name of the statistics log file
- python 3.6 or higher
- Senzing API version 3.1 or higher
Place the the following files on a directory of your choice ...
Note: This only needs to be performed one time! In fact you may want to add these configuration updates to a master configuration file for all your data sources.
Loading the Open Ownership Register into Senzing only requires registering the data souce. No additional features or attributes are required. This configuration is contained in the oor_config_updates.g2c file. To apply it, from your Senzing project's python directrory type ...
python3 G2ConfigTool.py <path-to-file>/oor_config_updates.g2c
Download the Open Ownership Register file from https://register.openownership.org/download. Just select the latest date and download the file. It should be named something like "statements.yyyy-mm-ddThh_mm_ssZ.jsonl.gz"
Then in a terminal session, navigate to where you downloaded this mapper and type ...
python3 oor_mapper.py -i /download_path/statements.yyyy-mm-ddThh_mm_ssZ.jsonl.gz -o /output_path/sz_oor_register.yyyy-mm-dd.jsonl.gz
- If the output file name ends with ".gz", the output file will be compressed
- Add the -l --log_file argument to generate a mapping statistics file
If you use the G2Loader program to load your data, from the /opt/senzing/g2/python directory ...
python3 G2Loader.py -f /output_path/sz_oor_register.yyyy-mm-dd.jsonl.gz
This data set currently contains about 18 million entities and owners and make take several hours to load based on your hardware.