Merge columns filter

TODO: update to include Python and the HXL Proxy as well.

The hxlmerge command-line tool creates a new HXL dataset that combines two other datasets. It starts with a primary dataset, then adds fields from another based on shared keys. For example, you could extract the #adm1 field from a second dataset listing administrative subdivisions, and add it the a 3W dataset, based on a shared #adm1_id (e.g. a p-code).

Along with hxlcount (command), this is one of the most difficult of the tools to learn, but also one of the most powerful, because it lets you create something entirely new.

Usage

usage: hxlmerge [-h] -m filename -k tag,tag... -t tag,tag... [-r] [-O]
                [infile] [outfile]

Merge part of one HXL dataset into another.

positional arguments:
  infile                HXL file to read (if omitted, use standard input).
  outfile               HXL file to write (if omitted, use standard output).

optional arguments:
  -h, --help            show this help message and exit
  -m filename, --merge filename
                        HXL file to write (if omitted, use standard output).
  -k tag,tag..., --keys tag,tag...
                        HXL tag(s) to use as a shared key.
  -t tag,tag..., --tags tag,tag...
                        Comma-separated list of column tags to include from
                        the merge dataset.
  -r, --replace         Replace empty values in existing columns (when
                        available) instead of adding new ones.
  -O, --overwrite       Used with --replace, overwrite existing values.

Keys

The command will take the first value for each tag provided in the --keys (or -k) argument to match the datasets. In most cases, you'll provide just one key, but it's possible to use several, e.g.

--keys sector,subsector

This would include values from the merge dataset where both the #sector and #subsector tags have the same value in the primary dataset.

Overwriting

By default, this command adds new columns to the right side of a dataset. There are, however, two options that allow replacing values in existing columns.

By itself, the --replace (or -r) option will fill in empty values, but leave non-empty values alone. For example, in a row that has a value for #adm1, nothing will happen; in a row that has a blank cell for #adm1, the command will fill in the value from the merge dataset. This is useful for a multi-stage merge (try merge dataset 1, then merge dataset 2 if nothing found, etc.).

With the --overwrite (or -O) option, this command will always overwrite the value in an existing cell, even if the value from the merge dataset is empty.

Examples

Input file: 3w-in.csv

#org	#sector	#adm1_id
OXFAM	WASH	003
CARE	WASH	001
Save the Children	Education	001
Red Cross	Protection	002
UNHCR	Shelter	001

Merge file: adm.csv

#adm1_id	#adm1	#people_num
001	Coastal Region	300000
002	Mountain Region	120000
003	Capital Region	650000

Command:

hxlmerge --merge adm.csv --keys adm1_id --tags adm1,people_num \
  <3w-in.csv >3w-out.csv

Output file: 3w-out.csv

#org	#sector	#adm1_id	#adm1	#people_num
OXFAM	WASH	003	Capital Region	650000
CARE	WASH	001	Coastal Region	300000
Save the Children	Education	001	Coastal Region	300000
Red Cross	Protection	002	Mountain Region	120000
UNHCR	Shelter	001	Coastal Region	300000

Standard: http://hxlstandard.org | Mailing list: hxlproject@googlegroups.com

Home

For everyone
- Installation
- Command-line tools
For coders
Building blocks
HXL cookbook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge columns filter

Usage

Keys

Overwriting

Examples

Clone this wiki locally