Skip to content
David Megginson edited this page Aug 25, 2015 · 3 revisions

The hxlvalidate command-line tool validates a HXL dataset against a simple, spreadsheet-style schema. This article describes the schema format.

Schema hashtags

The schema is itself a HXL dataset, using the following hashtags:

#x_tag (Required) The name of the tag, including the "#" character (e.g. "#sector").
#x_minoccur_num The minimum number of times a non-empty value for the tag must appear in each row of the dataset. Defaults to no minimum.
#x_maxoccur_num The maximum number of times a non-empty value for the tag may appear in each row of the dataset. Defaults to no maximum.
#x_datatype The type of data expected in the column under the HXL tag. Currently-allowed values are "text", "number", "url", "email", and "phone" ("date" coming soon). Defaults to no type checking.
#x_minvalue_num The minimum value allowed when #x_datatype is "number". Defaults to no minimum value. Ignored for non-numeric datatypes.
#x_maxvalue_num The maximum value allowed when #x_datatype is "number". Defaults to no maximum value. Ignored for non-numeric datatypes.
#x_pattern A regular expression pattern that value must match (e.g. "^([0-9])(,[0-9])*$").
#x_enumeration A list of allowed values, separated by "|" (e.g. "female|male").
#x_caseinsenstive "1" if matches for patterns and enumerations should be case-insensitive.

Sample schema

Here is a simple sample schema:

#x_tag #x_minoccur_num #x_maxoccur_num #x_datatype #x_enumeration
#org 1 text
#sector 1 1 text WASH|Health|Education|CCCM|Protection
#subsector text
#country 1 1 text Guinea|Liberia|Sierra Leone
#adm1 text

Usage

You could validate a dataset against this schema using the following command:

hxlvalidate --schema MYSCHEMA.csv MYDATASET.csv > errors.txt