-
Notifications
You must be signed in to change notification settings - Fork 11
HXL schemas
David Megginson edited this page Feb 18, 2016
·
3 revisions
The hxlvalidate command-line tool validates a HXL dataset against a simple, spreadsheet-style schema. This article describes the schema format.
The schema is itself a HXL dataset, using the following hashtags:
#valid_tag | (Required) The name of the tag, including the "#" character (e.g. "#sector"). |
---|---|
#valid_required | Without the +min or +max attributes, a truthy value (like "1") means simply that the value is required. |
#valid_required+min | The minimum number of times a non-empty value for the tag must appear in each row of the dataset. Defaults to no minimum. |
#valid_required+max | The maximum number of times a non-empty value for the tag may appear in each row of the dataset. Defaults to no maximum. |
#valid_datatype | The type of data expected in the column under the HXL tag. Currently-allowed values are "text", "number", "url", "email", and "phone" ("date" coming soon). Defaults to no type checking. |
#valid_value+min | The minimum value allowed when #valid_datatype is "number". Defaults to no minimum value. Ignored for non-numeric datatypes. |
#valid_value+max | The maximum value allowed when #valid_datatype is "number". Defaults to no maximum value. Ignored for non-numeric datatypes. |
#valid_value+regex | A regular expression pattern that value must match (e.g. "^([0-9])(,[0-9])*$"). |
#valid_value+list | A list of allowed values, separated by "|" (e.g. "female|male"). |
#valid_value+case | "1" if matches for patterns and enumerations should be case-insensitive. |
#valid_severity | The severity of the error, for user feedback. Allowed values are "info", "warning", or "error" (the default). |
#description | A human-readable description of the error, to provide user feedback. |
Here is a simple sample schema:
#valid_tag | #valid_severity | #valid_required+min | #valid_required+max | #valid_datatype | #valid_value+list | #description |
---|---|---|---|---|---|---|
#org | error | 1 | text | You must provide the name of the organisation doing the work. | ||
#sector | error | 1 | 1 | text | WASH|Health|Education|CCCM|Protection | You must provide the primary cluster for the activity. |
#subsector | info | text | Adding a subsector allows better aid coordination. | |||
#country | error | 1 | 1 | text | Guinea|Liberia|Sierra Leone | You must specify the country where the work is taking place. |
#adm1 | warning | 1 | text | We strongly encourage specifying the administrative subdivision as well as the country. |
You could validate a dataset against this schema using the following command:
hxlvalidate --schema MYSCHEMA.csv MYDATASET.csv > errors.txt
Standard: http://hxlstandard.org | Mailing list: hxlproject@googlegroups.com