Skip to content

Commit

Permalink
[Backport 1.7.latest] Fix ensuring we produce valid jsonschema artifa…
Browse files Browse the repository at this point in the history
…cts for manifest, catalog, sources, and run-results (#9229)

* Drop `all_refs=True` from jsonschema-ization build process

Passing `all_refs=True` makes it so that Everything is a ref, even
the top level schema. In jsonschema land, this essentially makes the
produced artifact not a full schema, but a fractal object to be included
in a schema. Thus when `$id` is passed in, jsonschema tools blow up
because `$id` is for identifying a schema, which we explicitly weren't
creating. The alternative was to drop the inclusion of `$id`. Howver, we're
intending to create a schema, and having an `$id` is recommended best
practice. Additionally since we were intending to create a schema,
not a fractal, it seemed best to create to full schema.

* Explicity produce jsonschemas using DRAFT_2020_12 dialect

Previously were were implicitly using the `DRAFT_2020_12` dialect through
mashumaro. It felt wise to begin explicitly specifying this. First, it
is closest in available mashumaro provided dialects to what we produced
pre 1.7. Secondly, if mashumaro changes its default for whatever reason
(say a new dialect is added, and mashumaro moves to that), we don't want
to automatically inherit that.

* Begin including schema dialect specification in produced jsonschema

In jsonschema's documentation they state
> It's not always easy to tell which draft a JSON Schema is using.
> You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to.
> It's generally good practice to include it, though it is not required.

and

> For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world.

Basically, to know how to parse a schema, it's important to include what
schema dialect is being used for the schema specification. The change in
this commit ensures we include that information.

* Add change documentation for jsonschema schema production fix

* Regenerate dbt jsonschemas with fixed mashumaro jsonschema production process

Specifically we regenerated
* catalog v1
* manifest v11
* run-results v5
* sources v3
using the command `scripts/collect-artifact-schema.py --path schemas`
  • Loading branch information
QMalcolm authored Dec 7, 2023
1 parent fc2d16f commit 0ef8638
Show file tree
Hide file tree
Showing 6 changed files with 19,826 additions and 6,568 deletions.
7 changes: 7 additions & 0 deletions .changes/unreleased/Fixes-20231127-165244.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
kind: Fixes
body: Ensure we produce valid jsonschema schemas for manifest, catalog, run-results,
and sources
time: 2023-11-27T16:52:44.590313-08:00
custom:
Author: QMalcolm
Issue: "8991"
3 changes: 2 additions & 1 deletion core/dbt/contracts/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
ValidationError,
)
from mashumaro.jsonschema import build_json_schema
from mashumaro.jsonschema.dialects import DRAFT_2020_12
import functools


Expand Down Expand Up @@ -197,7 +198,7 @@ class VersionedSchema(dbtClassMixin):
@classmethod
@functools.lru_cache
def json_schema(cls) -> Dict[str, Any]:
json_schema_obj = build_json_schema(cls, all_refs=True)
json_schema_obj = build_json_schema(cls, dialect=DRAFT_2020_12, with_dialect_uri=True)
json_schema = json_schema_obj.to_dict()
json_schema["$id"] = str(cls.dbt_schema_version)
return json_schema
Expand Down
Loading

0 comments on commit 0ef8638

Please sign in to comment.