Refactor Bindgen to simpler form #18

bhelx · 2024-10-28T21:32:04Z

Refactoring the bindgen with the goal of having a simpler IR-like form for types
that makes recursively generating type signatures and code easier.

The end goal is to remove all the hairy type portions of the XTP Schema
document. e.g.:

type
format
items
$ref
additionalProperties

and replace with a recursively defined type XtpNormalizedSchema (name
will likely change to XtpType). I've kept things backwards compatible by
only adding this type on the property xtpType. We should switch
bindgens to use this object then we can remove the other properties.

I also decided to change a little bit about the whole process. e.g.:

moved validation to the parser level code
remove circularReference detection and blocking
added some more tests

In order to think more clearly about the whole process, I took up a moment to
write up what I think the whole flow for schemas should be in terms of validating and compiling:

Step 0: JSON Schema

First we validate against JSON schema. This should ensure that we can parse the document into typescript types in the next step. It should not allow any extra values or any values outside of the enum ranges. We should be able to do a simple const doc = rawDoc as V1Schema in typescript and the doc object should be valid.

Step 1: Parse and Validate

Here we parse the json or yaml into a raw javascript object and cast it to the V1 or V0 schema type. This gives us a raw, but typed representation of the schema. Here we should do extra conditional validation steps that can’t be done (or are too complex to be done) in the JSON Schema. e.g. validating content-type / type pairs, etc.

Step 2: Normalize

Here we take the raw parsed types and “normalize” them into a simpler form. We will walk the document and replace all occurrences of $ref, items, additionalProperties, type, and format with a single recursive XtpType.

dylibso/xtp-bindgen#18

Refactoring the bindgen with the goal of having a simpler IR-like form that makes recursively generating types and code easier. The end goal is to remove all the hairy type portions of the XTP Schema document. e.g.: * type * format * items * $ref * additionalProperties and replace with a recursively defined type `XtpNormalizedSchema` (name will likely change to XtpType). I've kept things backwards compatible by only adding this type on the property `xtpType`. We should switch bindgens to use this object then we can remove the other properties. I also decided to change a little bit about the whole process. e.g.: * moved validation to the parser level code * remove circularReference detection and blocking * added some more tests In order to think more clearly about the whole process, I took up a moment to write up what I think the whole flow for schemas should be in terms of validating and compiling: First we validate against JSON schema. This should ensure that we can parse the document into typescript types in the next step. It should not allow any extra values or any values outside of the enum ranges. We should be able to do a simple `const doc = rawDoc as V1Schema` in typescript and the doc object should be valid. Here we parse the json or yaml into a raw javascript object and cast it to the V1 or V0 schema type. This gives us a raw, but typed representation of the schema. Here we should do extra conditional validation steps that can’t be done (or are too complex to be done) in the JSON Schema. e.g. validating content-type / type pairs, etc. Here we take the raw parsed types and “normalize” them into a simpler form. We will walk the document and replace all occurrences of $ref, items, additionalProperties, type, and format with a single recursive XtpType.

dylibso/xtp-bindgen#18

bhelx · 2024-10-29T16:06:22Z

src/common.ts

-        super(message);
-        Object.setPrototypeOf(this, ValidationError.prototype);
-    }
+export class ValidationError {


We don't want to throw exceptions and stop the process, we want to collect all validation errors giving the user the chance to address them all in one.

bhelx · 2024-10-29T16:09:16Z

src/index.ts

-function isDateTime(p: Property | Parameter | null): boolean {
-  if (!p) return false;
-  return p.type === "string" && p.format === "date-time";
+export type XtpTyped = { xtpType: XtpNormalizedType };


this is an external convenience type for any node in the doc that should have an xtp type. think schema, property, parameter, items, additionalProperties, etc.

bhelx · 2024-10-29T16:10:31Z

src/index.ts

-  return p.type === "string" && p.format === "date-time";
+export type XtpTyped = { xtpType: XtpNormalizedType };
+
+function isDateTime(p: XtpTyped): boolean {


These are helpers to quickly check the type of a node without drilling into the xtpType.kind. Will work on any XtpTyped object.

bhelx · 2024-10-29T16:10:50Z

src/parser.ts

+ */
+import { ValidationError } from "./common"
+
+export interface ParseResult {


now returning this result object which may have errors

bhelx · 2024-10-29T16:11:46Z

src/parser.ts

+  constructor(doc: V1Schema) {
+    this.doc = doc as any
+    this.errors = []
+    this.location = ['#']


instead of hand building the location we can just track it as we walk the nodes by pushing and popping from this array.

I like this a lot – it's an elegant abstraction!

bhelx · 2024-10-29T16:16:05Z

src/parser.ts

+    if (prop.additionalProperties) {
+      this.validateTypedInterface(prop.additionalProperties)
+
+      // here we are adding some extra constraints on the value type


Just above this we validate that additionalProperties meets XtpTyped rules. but here we are adding some extra constraints because we don't want to support recursive value types here just yet. We have done this to arrays as well. note: Refs are supported because they are easier to generate casting code for them.

A ha – so to confirm my understanding, is it the trickiness of writing recursive casting code that's steering us away from supporting recursive types?

at least in items and additionalProperties. yes. but when i get that working i think we can delete these extra validations.

I ended up backing this change out. Though if it seems too complex in other bindgens then maybe we put it back.

bhelx · 2024-10-29T16:28:24Z

src/parser.ts

 }

-export type XtpSchemaType = 'object' | 'enum' | 'map'
+// TODO this figure out how to split up type again?
+//export type XtpSchemaType = 'object' | 'enum' | 'map'


we don't really need this type anymore. we can now put any type into a schema. though maybe we should hold off on relaxing the constraints in the json schema until we can test. In theory we should be able to support enums and maps as properties too (inline definition) but we'll want to wait to do that and will need to add a requirement that they have name.

bhelx · 2024-10-29T16:29:01Z

src/parser.ts

 export type XtpFormat =
  'int32' | 'int64' | 'float' | 'double' | 'date-time' | 'byte';

-export interface XtpItemType {


This is really just another XtpTyped. items and addtionalProperties are no different (from a type perspective) than a property, parameter, etc

bhelx · 2024-10-29T16:29:55Z

src/types.ts

+ * We will normalize the raw XTP schema into these recursively defined types.
+ */
+
+export type XtpNormalizedKind =


I'll likely change the name of this. This is just so we don't conflict with XtpType.

bhelx · 2024-10-29T16:51:49Z

src/parser.ts

+class V1Validator {
+  errors: ValidationError[]
+  location: string[]
+  doc: any


treating the whole doc as any (temporarily) allows us walk the whole document like a tree

we can cast some nodes back to known interfaces when we need to, like XtpTyped

bhelx · 2024-10-29T16:53:20Z

src/parser.ts

+    this.validateTypedInterface(node)
+
+    if (node && typeof node === 'object') {
+      // i don't think we need to validate array children


I believe arrays are effectively terminal in our document. this could change though.

bhelx · 2024-10-29T16:55:52Z

src/parser.ts

+
+  recordError(msg: string) {
+    this.errors.push(
+      new ValidationError(msg, this.getLocation())


calling recordError uses our current location

bhelx · 2024-10-29T16:56:42Z

src/parser.ts

+   * Validates that a node conforms to the rules of
+   * the XtpTyped interface. Validates what we can't
+   * catch in JSON Schema validation.


There may be some more rules in here i haven't encoded yet. Either way, this is where we add the custom validation on the raw type.

We may want to check for ref integrity here as well. Just make sure that all refs point to a valid target.

bhelx · 2024-10-29T16:57:34Z

src/parser.ts

+      // here we are adding some extra constraints on the value type
+      // we can relax these later when we can ensure we can cast these properly
+      this.location.push('items')
+      if (prop.items.items) {
+        this.recordError("Arrays are currently not supported as element types of arrays")
+      }
+      if (prop.items.additionalProperties) {
+        this.recordError("Maps are currently not supported as element types of arrays")
+      }
+      this.location.pop()


See comment below

bhelx · 2024-10-29T17:00:42Z

src/normalizer.ts

-  }
-}
-
-function detectCircularReference(schema: Schema, visited: Set<string> = new Set()): ValidationError | null {


I deleted this but have not fixed this yet

I don't think we need to block circular refs just get clever about how to process it

chrisdickinson

I left a couple of comments about splitting out the concept of field descriptors from types & some fretting about MapType but overall this is a great improvement! Normalizing the schema down from the authored format feels like cutting with the grain of the wood.

chrisdickinson · 2024-10-29T17:02:54Z

src/index.ts

+  return p?.xtpType?.kind === "boolean"
+}
+function isMap(p: XtpTyped): boolean {
+  return p?.xtpType?.kind === "map"


If I had something like:

type: object properties: foo: type: string nullable: false bar: type: number additionalProperties: type: 'string'

Would that be a map or an object? (Does this change for patternProperties?)

(Edit: I get into semi-structured objects a little later in the review at MapType!)

I think this should be caught in the validation step as these two concepts should not be mixable

Yeah, currently we don't support that. additionalProperties and properties are mutually exclusive

chrisdickinson · 2024-10-29T17:05:25Z

src/parser.ts

+      for (const key in node) {
+        if (Object.prototype.hasOwnProperty.call(node, key)) {


golf: for (const [key, child] of Object.entries(node)) { should work here

I can replace the hasOwnProperty call too? is Object.entries() shallow? Will take a look

chrisdickinson · 2024-10-29T17:11:33Z

src/parser.ts

+    if (prop.additionalProperties) {
+      this.validateTypedInterface(prop.additionalProperties)
+
+      // here we are adding some extra constraints on the value type


A ha – so to confirm my understanding, is it the trickiness of writing recursive casting code that's steering us away from supporting recursive types?

chrisdickinson · 2024-10-29T17:37:32Z

src/types.ts

+  }
+}
+
+export class MapType implements XtpNormalizedType {


MapType is raising some hairs – there's this concept of a semi-structured object in JSONSchema, where some properties are defined with specific types and other properties are allowed according to a pattern or a catch-all (additionalProperties or patternProperties). In Rust, at least, that's supported using #[serde(flatten)] – the additional properties can be flattened into a HashMap<String, T> on the parent object. Golang looks like it supports something similar with map[string]*json.RawMessage, though it requires some additional elbow grease.

I suppose my worry is that, while we can strictly disallow maps and structs now, if we codify MapType as distinct from an ObjectType it'll be hard to introduce that functionality later. (This kind of gets back to representing properties on an object as an array of FieldDescriptor)

Map type should be used for typing what are now untyped objects. The distinction b/w an object and a map is dynamic keys, whereas object are for when you know the keys and types ahead of time.

A good example would be http headers i think. Today you can just say that it's an untyped object, but tomorrow you can say that it's a Map<String, String>.

Ah yeah – the trick is that JSONSchema lets you represent both typed and dynamic keys in a single object. HTTP headers could be a useful example of a semi-structured object, too – you might say something like:

ResponseHeaders: type: object properties: etag: type: string additionalProperties: type: string

and in rust you might produce:

#[derive(Deserialize)] struct ResponseHeaders { etag: String #[serde(flatten)] additional_properties: HashMap<String, String> }

or in TS:

interface ResponseHeaders { etag: string [additionalProperties: string]: string }

(This not to say we need to enable this now, but if we're looking to track OpenAPI support it'd be useful to leave the door open to adding this down the line. I worry that adding a distinct Map type makes it harder to add support later, vs. pulling that info into the FieldDescriptor.)

chrisdickinson · 2024-10-29T17:39:21Z

src/parser.ts

+  constructor(doc: V1Schema) {
+    this.doc = doc as any
+    this.errors = []
+    this.location = ['#']


I like this a lot – it's an elegant abstraction!

chrisdickinson · 2024-10-29T17:53:30Z

src/types.ts

+function cons(t: XtpNormalizedType, opts?: XtpTypeOpts): XtpNormalizedType {
+  // default them to false
+  t.nullable = (opts?.nullable === undefined) ? false : opts.nullable
+  t.required = (opts?.required === undefined) ? false : opts.required


(as you know!) I suspect this should probably get split into PropertyDescriptor and NormalizedType, where things like required, field-level description, name (or namePattern) live on the PropertyDescriptor and things like kind, nullable, type-level description and other type info can live on the NormalizedType. This gives you a distinction between "field-level" properties when doing code generation and "type-level" properties. On the codegen side you'd end up iterating over a list of PropertyDescriptor objects with type information inside.

(This gets at a pretty big impedance mismatch between JSONSchema and the sort of codegen XTP does – JSONSchema doesn't need to name types, but having a name for any sort of rich type is required in most languages we target. This might be a good place to generate an "inferred name" for any type that's otherwise unnamed by the schema as authored?)

A maybe-more complete sketch:

interface PropertyDescriptor { value: XtpNormalizedType, description?: string codeExamples: string[] } interface NamedPropertyDescriptor extends PropertyDescriptor { name: string required: boolean } interface PatternPropertyDescriptor extends PropertyDescriptor { pattern: RegExp } // pretty much the same as before, sans "required" interface XtpTypeOpts { nullable?: boolean; } // ditto! export interface XtpNormalizedType extends XtpTypeOpts { kind: XtpNormalizedKind; } // a little different: note the split between `name` and `inferredName` and the list of `properties` class ObjectType implements XtpNormalizedType { kind: XtpNormalizedKind = 'object'; // if the object appears in a "naming" position (e.g., listed in components/schemas) name?: string; // Type-level docs and code examples can go here. description: string; codeExamples: string; // if name is not available we infer a name by building up a path from the last named object inferredName: string; properties: PropertyDescriptor[] }

See dylibso/xtp-bindgen#18

mhmd-azeez

Great work, the only thing left I think is adding warnings to the result

mhmd-azeez · 2024-10-30T16:13:42Z

src/common.ts

-        super(message);
-        Object.setPrototypeOf(this, ValidationError.prototype);
-    }
+export class ValidationError {


mhmd-azeez · 2024-10-30T16:14:45Z

src/parser.ts

+  constructor(doc: V1Schema) {
+    this.doc = doc as any
+    this.errors = []
+    this.location = ['#']


bhelx · 2024-11-01T15:17:47Z

Going to merge and push since it should be backwards compatible

dylibso/xtp-bindgen#18

See dylibso/xtp-bindgen#18

* Refactor: see dylibso/xtp-bindgen#18 * Support nullable vs required

See dylibso/xtp-bindgen#18

* Refactor Bindgen dylibso/xtp-bindgen#18 * get XtpTyped from the project * bump to latest

bhelx requested review from nilslice and mhmd-azeez as code owners October 28, 2024 21:32

bhelx marked this pull request as draft October 28, 2024 21:32

bhelx force-pushed the refactor-bindgen branch 3 times, most recently from d74b3a0 to 29c9f87 Compare October 28, 2024 21:43

bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Oct 28, 2024

Refactor Bindgen

68e89cf

dylibso/xtp-bindgen#18

bhelx mentioned this pull request Oct 28, 2024

Refactor Bindgen dylibso/xtp-typescript-bindgen#37

Merged

bhelx force-pushed the refactor-bindgen branch 6 times, most recently from 60deead to daf9952 Compare October 28, 2024 23:28

bhelx force-pushed the refactor-bindgen branch from daf9952 to cfc35d7 Compare October 28, 2024 23:31

bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Oct 29, 2024

Refactor Bindgen

c1e823f

dylibso/xtp-bindgen#18

bhelx commented Oct 29, 2024

View reviewed changes

bhelx force-pushed the refactor-bindgen branch from 541037a to 75e60d0 Compare October 29, 2024 16:07

bhelx commented Oct 29, 2024

View reviewed changes

bhelx force-pushed the refactor-bindgen branch from 75e60d0 to 15bec16 Compare October 29, 2024 16:12

bhelx commented Oct 29, 2024

View reviewed changes

bhelx added 2 commits October 29, 2024 11:16

fix validation, restrict map value types

000bc0d

restrict arrays as well

40082b9

bhelx commented Oct 29, 2024

View reviewed changes

chrisdickinson approved these changes Oct 29, 2024

View reviewed changes

bhelx added 2 commits October 29, 2024 16:37

remove restrictions on nested types

0e08154

more tests

63f8453

bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Oct 30, 2024

Refactor of bindgen code

b4c20da

See dylibso/xtp-bindgen#18

bhelx mentioned this pull request Oct 30, 2024

Refactor of bindgen code dylibso/xtp-go-bindgen#22

Merged

mhmd-azeez approved these changes Oct 30, 2024

View reviewed changes

mhmd-azeez mentioned this pull request Oct 31, 2024

feat: add support for property and parameter maps too #14

Closed

bhelx added 3 commits November 1, 2024 10:13

Remove required from type, prefer required on property

c065934

add warnings property as part of response

13879c4

bump to rc12

9a1e0ea

bhelx merged commit 70905c7 into main Nov 1, 2024
3 checks passed

bhelx deleted the refactor-bindgen branch November 1, 2024 15:17

bhelx mentioned this pull request Nov 1, 2024

Handle back-compat untyped object #19

Merged

bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Nov 1, 2024

Refactor Bindgen

ee1065c

dylibso/xtp-bindgen#18

bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Nov 1, 2024

Refactor Bindgen

3e85fcf

dylibso/xtp-bindgen#18

bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 1, 2024

Refactor of bindgen code

ff796c4

See dylibso/xtp-bindgen#18

bhelx mentioned this pull request Nov 4, 2024

feat: make sure users can define exports/imports/schemas/properties named format/type #20

Merged

bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 4, 2024

Refactor of bindgen code

a6c9a89

See dylibso/xtp-bindgen#18

bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 4, 2024

Refactor of bindgen code

ffd5bb7

See dylibso/xtp-bindgen#18

bhelx mentioned this pull request Nov 4, 2024

Refactor for bindgen updates dylibso/xtp-rust-bindgen#15

Draft

bhelx added a commit to dylibso/xtp-rust-bindgen that referenced this pull request Nov 4, 2024

refactor to bindgen changes

eb8f8e9

* Refactor: see dylibso/xtp-bindgen#18 * Support nullable vs required

bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 7, 2024

Refactor of bindgen code (#22)

134aa74

See dylibso/xtp-bindgen#18

bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Nov 11, 2024

Refactor Bindgen (#37)

0b347f2

* Refactor Bindgen dylibso/xtp-bindgen#18 * get XtpTyped from the project * bump to latest

		for (const key in node) {
		if (Object.prototype.hasOwnProperty.call(node, key)) {

Refactor Bindgen to simpler form #18

Refactor Bindgen to simpler form #18

Conversation

bhelx commented Oct 28, 2024 • edited Loading

Step 0: JSON Schema

Step 1: Parse and Validate

Step 2: Normalize

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhelx Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhelx Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhelx Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhelx Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisdickinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhmd-azeez Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhmd-azeez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhelx commented Nov 1, 2024

bhelx commented Oct 28, 2024 •

edited

Loading

bhelx Oct 29, 2024 •

edited

Loading

bhelx Oct 29, 2024 •

edited

Loading

bhelx Oct 29, 2024 •

edited

Loading

bhelx Oct 29, 2024 •

edited

Loading

mhmd-azeez Oct 30, 2024 •

edited

Loading