refactor(prompts): validate jsonschema using third-party library #5988
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
LLM API providers support relatively advanced features of JSON schema for tools and response formats (e.g., recursive schemas) that are complex to implement in Pydantic. So this PR simplifies things and introduces a third-party library.
There are two main choices for third-party libraries in the Python ecosystem,
jsonschema
andfastjsonschema
. As the name suggests, the latter is significantly faster (>10x in my benchmarking), but seems to be less actively maintained (no support for more recent formats and less overall activity in the repo). It also has a few quirks in implementation that deviate from the standard spec and make it more lax than other implementations in a way that could make it difficult to migrate tojsonschema
. So I chosejsonschema
to start.I am relying on JSON Schema draft 7, a relatively old version of the spec that is also supported in
fastjsonschema
to give us flexibility to switch (in theory). It also fully supports all the code snippets I've found in OpenAI's and Anthropic's documentation so far. The implementation is open to adding support for additional versions.I've added a max version to the
jsonschema
version so that we are not on the bleeding edge in case there is a regression.resolves #5987