Skip to content
fge edited this page Sep 10, 2012 · 83 revisions

Note

This page mentions the status of the latest version -- ie, master.

What is supported

All section 5 of the draft is supported, apart from the limitations mentioned below. Supported features include:

  • union types (in type as well as in disallow),
  • full dependencies (ie, property dependencies as well as schema dependencies),
  • "multiple extends" (ie, an array of schemas),
  • tuple/non-tuple validation for arrays,
  • $ref with loop detection,
  • formats (however, see below),
  • enums,
  • etc etc.

Limitations

Unsupported keywords

First, a recall: this library is about validation only. As such, all keywords not related to validation per se are not supported. These include:

  • default,
  • links,
  • description,
  • name.

One actual validation keyword which is not supported yet is $schema. Right now, this library only does draft v3.

Strict JSON input required

Even though Jackson has the ability to parse many malformed JSON documents, this project asks Jackson to obey the specification to the letter (which is the default behavior anyway):

  • no comments are allowed;
  • strings must be surrounded by double quotes, not single quotes;
  • there is no type inference: "true" is a JSON String, not JSON boolean true; "42" is a string, but 42 is an integer; "null" is... You get the picture.

There is one element differing from the default Jackson behavior: it is asked to use BigDecimal to store decimal numbers instead of double. This is on purpose, see below.

URI dereferencing

The only natively supported schemes are http, file, ftp and jar (these are all the schemes natively sypported by Java's URL class, save for https). The API makes it easy to extend this, though. There is also a custom scheme named resource as well, to access a Java resource.

There are other important points to consider:

  • If you register a schema which has an id field, then this id will be considered as this schema's URI if and only if it is absolute, and has no fragment or an empty fragment. Any id field found which does not obey these requirements will be discarded and the schema will be considered to be "rootless", or anonymous (that is, its id is #).
  • Any attempt to dereference a relative URI, if it cannot be resolved to an absolute URI, will fail (note however that this does not apply to "fragment only" URIs, since these will be resolved against the current schema, even if it is anonymous). You can, however, add a root namespace (for instance, you have a set of schemas on the filesystem which reference one another).
  • If, during the process of $ref resolution, a schema is successfully downloaded via a URI, this schema's id field, if any, will be discarded.
  • Fetching of external resources can of course fail, but (with resources available over the network in particular) it can also hang. The implementation can do nothing against that. Be sure to only support the schemes you know will actually work -- or register all your schemas in advance.
  • URI schema resources are cached at runtime and never discarded until your application shuts down. This may, or may not, be a problem to you. But this means that, for instance, you will not be able to rely on external, dynamically generated validators.

format

This package only contains a limited set of format specifiers. Those are:

  • date-time;
  • email;
  • host-name;
  • ip-address;
  • ipv6;
  • regex;
  • and, finally, and probably the most important: uri.

All other format specifiers (except for color and style), plus a few other ones which are not in the specification, are in a separate project:

https://github.com/fge/json-schema-formats

Limits on m{in,ax}Length and m{in,ax}Items

In a schema, these enforce resp. the minimum/maximum length of a string instance, and the minimum/maximum number of items of an array instance. This project is Java, as such the implementation won't accept any values for these which are greater than Integer.MAX_VALUE, that is 2^31 - 1. You don't have JSON documents that big, do you? Well, OK, some modern NoSQL databases may have JSON data as large, if not even larger.

What the draft doesn't say explicitly, but which is implicit, and is implemented

(for some definition of "implicit")

Unknown keywords in schemas

Unknown keywords in schemas are purely and simply ignored. Beware of spelling mistakes!

properties and patternProperties

If a property of an object instance being validated matches exactly a field defined in properties, then this property will be validated against the corresponding schema, so far so good.

However, nothing says that this property should match only this schema. In fact, in this case, the implementation also goes through patternProperties to see if the property happens to match a regex in there too (and see below about regexes). If and only if the property matches neither of them is additionalProperties considered (provided that it is not false, of course).

As an example, consider this schema:

{
    "type": "object",
    "properties": {
        "p1": { "type": "string" }
    },
    "patternProperties": {
        "p": { "minLength": 10 },
        "1": { "format": "host-name" }
    }
}

Now, if the instance to validate contains a property named p1:

  • it will of course have to validate against the schema defined by the corresponding entry in properties;
  • but p1 is also matched by both regexes p and 1 in patternProperties, so it will also have to validate the two corresponding schemas.

That makes three schemas which a property p1 must validate against.

divisibleBy, exclusiveM{in,ax}imum m{in,ax}imum

Curiously, the draft doesn't say that, for instance, if exclusiveMinimum is present, then minimum MUST also be present. Neither does it say that the number in divisibleBy must not be 0. However, if you have a look at the schema , you see this:

{
    "divisibleBy": {
        "type": "number",
        "minimum": 0,
        "exclusiveMinimum": true,
        "default": 1
    },
    "dependencies": {
        "exclusiveMinimum": "minimum",
        "exclusiveMaximum": "maximum"
    },
    "etc": "etc"
}

Which means what it means. Those are therefore enforced at the syntax checking level.

Discussions about some fine points of the draft

Ref resolution failure is a fatal error -- even in disallow

This is an implementation choice. The draft does not say anything about what should happen in the event of a JSON Reference resolution failure: in this implementation, any reference resolution failure is considered a fatal error, and validation stops immediately (and fails).

Numeric instance validation

This applies to integer and number JSON nodes, and therefore to the minimum, maximum and divisibleBy keywords. And especially to the latter.

The first thing to know is that the JSON spec itself does not specify a limit on the precision or scale of numeric instances, and neither does the JSON Schema draft (regardless of the fact that JavaScript limits itself to IEEE 745 floating point numbers -- JSON is not JavaScript).

For this reason, the implementation chooses to use Java's BigDecimal for numeric instance validation, and falls back to long if and only if both the schema keyword value and the instance value fit into this type. For decimal validation however, rounding has to be taken into account... And rounding means rounding errors, which means inaccuracies, which means wreaking havoc to the divisibleBy check in particular. I don't like inaccuracy, so, for decimal numbers, BigDecimal it is and it will likely remain so for the foreseeable future.

Regex support: ECMA 262, and the real definition of "matching"

The draft is quite clear that regexes should conform to ECMA 262. This rules out java.util.regex entirely (for instance, possessive quantifiers, like in a++, are legal in Java, but are not supported by ECMA 262). The only Java library (that I know of) in existence which is able to process ECMA 262 regexes is Rhino and its Javascript engine. This project uses it for that very reason (and, again, I don't like inaccuracy).

Also, even though the draft only implies it, please note that a regex can match anywhere in the input. Remember this when writing your schemas -- if you want your regex to match the whole input, you must anchor it. This is valid for the pattern keyword, but also for keys in patternProperties. A JSON Schema implementation which doesn't act this way simply does not obey the draft!

Hostname and email validation: not RFC conformant by default

These are two of the format specifications defined by the draft (resp. host-name and email). The respective associated RFCs do NOT require that hostnames and emails have a domain part at all, however this implementation chooses to require that they have one by default.

You can choose to get back to strict RFC conformance by using the appropriate validation feature (aptly named STRICT_RFC_CONFORMANCE). See the Javadoc for more details.

Clone this wiki locally