Skip to content
David Megginson edited this page Feb 18, 2016 · 3 revisions

The hxlvalidate command-line tool validates a HXL dataset against a simple, spreadsheet-style schema. This article describes the schema format.

Schema hashtags

The schema is itself a HXL dataset, using the following hashtags:

#valid_tag (Required) The name of the tag, including the "#" character (e.g. "#sector").
#valid_required Without the +min or +max attributes, a truthy value (like "1") means simply that the value is required.
#valid_required+min The minimum number of times a non-empty value for the tag must appear in each row of the dataset. Defaults to no minimum.
#valid_required+max The maximum number of times a non-empty value for the tag may appear in each row of the dataset. Defaults to no maximum.
#valid_datatype The type of data expected in the column under the HXL tag. Currently-allowed values are "text", "number", "url", "email", and "phone" ("date" coming soon). Defaults to no type checking.
#valid_value+min The minimum value allowed when #valid_datatype is "number". Defaults to no minimum value. Ignored for non-numeric datatypes.
#valid_value+max The maximum value allowed when #valid_datatype is "number". Defaults to no maximum value. Ignored for non-numeric datatypes.
#valid_value+regex A regular expression pattern that value must match (e.g. "^([0-9])(,[0-9])*$").
#valid_value+list A list of allowed values, separated by "|" (e.g. "female|male").
#valid_value+case "1" if matches for patterns and enumerations should be case-insensitive.
#valid_severity The severity of the error, for user feedback. Allowed values are "info", "warning", or "error" (the default).
#description A human-readable description of the error, to provide user feedback.

Sample schema

Here is a simple sample schema:

#valid_tag #valid_severity #valid_required+min #valid_required+max #valid_datatype #valid_value+list #description
#org error 1 text You must provide the name of the organisation doing the work.
#sector error 1 1 text WASH|Health|Education|CCCM|Protection You must provide the primary cluster for the activity.
#subsector info text Adding a subsector allows better aid coordination.
#country error 1 1 text Guinea|Liberia|Sierra Leone You must specify the country where the work is taking place.
#adm1 warning 1 text We strongly encourage specifying the administrative subdivision as well as the country.

Usage

You could validate a dataset against this schema using the following command:

hxlvalidate --schema MYSCHEMA.csv MYDATASET.csv > errors.txt