-
Notifications
You must be signed in to change notification settings - Fork 11
HXL schemas
David Megginson edited this page Aug 25, 2015
·
3 revisions
The hxlvalidate command-line tool validates a HXL dataset against a simple, spreadsheet-style schema. This article describes the schema format.
The schema is itself a HXL dataset, using the following hashtags:
#x_tag | (Required) The name of the tag, including the "#" character (e.g. "#sector"). |
---|---|
#x_minoccur_num | The minimum number of times a non-empty value for the tag must appear in each row of the dataset. Defaults to no minimum. |
#x_maxoccur_num | The maximum number of times a non-empty value for the tag may appear in each row of the dataset. Defaults to no maximum. |
#x_datatype | The type of data expected in the column under the HXL tag. Currently-allowed values are "text", "number", "url", "email", and "phone" ("date" coming soon). Defaults to no type checking. |
#x_minvalue_num | The minimum value allowed when #x_datatype is "number". Defaults to no minimum value. Ignored for non-numeric datatypes. |
#x_maxvalue_num | The maximum value allowed when #x_datatype is "number". Defaults to no maximum value. Ignored for non-numeric datatypes. |
#x_pattern | A regular expression pattern that value must match (e.g. "^([0-9])(,[0-9])*$"). |
#x_enumeration | A list of allowed values, separated by "|" (e.g. "female|male"). |
#x_caseinsenstive | "1" if matches for patterns and enumerations should be case-insensitive. |
Here is a simple sample schema:
#x_tag | #x_minoccur_num | #x_maxoccur_num | #x_datatype | #x_enumeration |
---|---|---|---|---|
#org | 1 | text | ||
#sector | 1 | 1 | text | WASH|Health|Education|CCCM|Protection |
#subsector | text | |||
#country | 1 | 1 | text | Guinea|Liberia|Sierra Leone |
#adm1 | text |
You could validate a dataset against this schema using the following command:
hxlvalidate --schema MYSCHEMA.csv MYDATASET.csv > errors.txt
Standard: http://hxlstandard.org | Mailing list: hxlproject@googlegroups.com