Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Bindgen to simpler form #18

Merged
merged 8 commits into from
Nov 1, 2024
Merged

Refactor Bindgen to simpler form #18

merged 8 commits into from
Nov 1, 2024

Conversation

bhelx
Copy link
Contributor

@bhelx bhelx commented Oct 28, 2024

Refactoring the bindgen with the goal of having a simpler IR-like form for types
that makes recursively generating type signatures and code easier.

The end goal is to remove all the hairy type portions of the XTP Schema
document. e.g.:

  • type
  • format
  • items
  • $ref
  • additionalProperties

and replace with a recursively defined type XtpNormalizedSchema (name
will likely change to XtpType). I've kept things backwards compatible by
only adding this type on the property xtpType. We should switch
bindgens to use this object then we can remove the other properties.

I also decided to change a little bit about the whole process. e.g.:

  • moved validation to the parser level code
  • remove circularReference detection and blocking
  • added some more tests

In order to think more clearly about the whole process, I took up a moment to
write up what I think the whole flow for schemas should be in terms of validating and compiling:

Step 0: JSON Schema

First we validate against JSON schema. This should ensure that we can parse the document into typescript types in the next step. It should not allow any extra values or any values outside of the enum ranges. We should be able to do a simple const doc = rawDoc as V1Schema in typescript and the doc object should be valid.

Step 1: Parse and Validate

Here we parse the json or yaml into a raw javascript object and cast it to the V1 or V0 schema type. This gives us a raw, but typed representation of the schema. Here we should do extra conditional validation steps that can’t be done (or are too complex to be done) in the JSON Schema. e.g. validating content-type / type pairs, etc.

Step 2: Normalize

Here we take the raw parsed types and “normalize” them into a simpler form. We will walk the document and replace all occurrences of $ref, items, additionalProperties, type, and format with a single recursive XtpType.

@bhelx bhelx marked this pull request as draft October 28, 2024 21:32
@bhelx bhelx force-pushed the refactor-bindgen branch 3 times, most recently from d74b3a0 to 29c9f87 Compare October 28, 2024 21:43
bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Oct 28, 2024
@bhelx bhelx force-pushed the refactor-bindgen branch 6 times, most recently from 60deead to daf9952 Compare October 28, 2024 23:28
Refactoring the bindgen with the goal of having a simpler IR-like form
that makes recursively generating types and code easier.

The end goal is to remove all the hairy type portions of the XTP Schema
document. e.g.:

* type
* format
* items
* $ref
* additionalProperties

and replace with a recursively defined type `XtpNormalizedSchema` (name
will likely change to XtpType). I've kept things backwards compatible by
only adding this type on the property `xtpType`. We should switch
bindgens to use this object then we can remove the other properties.

I also decided to change a little bit about the whole process. e.g.:

* moved validation to the parser level code
* remove circularReference detection and blocking
* added some more tests

In order to think more clearly about the whole process, I took up a moment to
write up what I think the whole flow for schemas should be in terms of validating and compiling:

First we validate against JSON schema. This should ensure that we can parse the document into typescript types in the next step. It should not allow any extra values or any values outside of the enum ranges. We should be able to do a simple `const doc = rawDoc as V1Schema` in typescript and the doc object should be valid.

Here we parse the json or yaml into a raw javascript object and cast it to the V1 or V0 schema type. This gives us a raw, but typed representation of the schema. Here we should do extra conditional validation steps that can’t be done (or are too complex to be done) in the JSON Schema. e.g. validating content-type / type pairs, etc.

Here we take the raw parsed types and “normalize” them into a simpler form. We will walk the document and replace all occurrences of $ref, items, additionalProperties, type, and format with a single recursive XtpType.
bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Oct 29, 2024
super(message);
Object.setPrototypeOf(this, ValidationError.prototype);
}
export class ValidationError {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to throw exceptions and stop the process, we want to collect all validation errors giving the user the chance to address them all in one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this!

function isDateTime(p: Property | Parameter | null): boolean {
if (!p) return false;
return p.type === "string" && p.format === "date-time";
export type XtpTyped = { xtpType: XtpNormalizedType };
Copy link
Contributor Author

@bhelx bhelx Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an external convenience type for any node in the doc that should have an xtp type. think schema, property, parameter, items, additionalProperties, etc.

return p.type === "string" && p.format === "date-time";
export type XtpTyped = { xtpType: XtpNormalizedType };

function isDateTime(p: XtpTyped): boolean {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are helpers to quickly check the type of a node without drilling into the xtpType.kind. Will work on any XtpTyped object.

*/
import { ValidationError } from "./common"

export interface ParseResult {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now returning this result object which may have errors

constructor(doc: V1Schema) {
this.doc = doc as any
this.errors = []
this.location = ['#']
Copy link
Contributor Author

@bhelx bhelx Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hand building the location we can just track it as we walk the nodes by pushing and popping from this array.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot – it's an elegant abstraction!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful!

src/parser.ts Outdated
if (prop.additionalProperties) {
this.validateTypedInterface(prop.additionalProperties)

// here we are adding some extra constraints on the value type
Copy link
Contributor Author

@bhelx bhelx Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just above this we validate that additionalProperties meets XtpTyped rules. but here we are adding some extra constraints because we don't want to support recursive value types here just yet. We have done this to arrays as well. note: Refs are supported because they are easier to generate casting code for them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ha – so to confirm my understanding, is it the trickiness of writing recursive casting code that's steering us away from supporting recursive types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least in items and additionalProperties. yes. but when i get that working i think we can delete these extra validations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up backing this change out. Though if it seems too complex in other bindgens then maybe we put it back.

}

export type XtpSchemaType = 'object' | 'enum' | 'map'
// TODO this figure out how to split up type again?
//export type XtpSchemaType = 'object' | 'enum' | 'map'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need this type anymore. we can now put any type into a schema. though maybe we should hold off on relaxing the constraints in the json schema until we can test. In theory we should be able to support enums and maps as properties too (inline definition) but we'll want to wait to do that and will need to add a requirement that they have name.

export type XtpFormat =
'int32' | 'int64' | 'float' | 'double' | 'date-time' | 'byte';

export interface XtpItemType {
Copy link
Contributor Author

@bhelx bhelx Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really just another XtpTyped. items and addtionalProperties are no different (from a type perspective) than a property, parameter, etc

* We will normalize the raw XTP schema into these recursively defined types.
*/

export type XtpNormalizedKind =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll likely change the name of this. This is just so we don't conflict with XtpType.

class V1Validator {
errors: ValidationError[]
location: string[]
doc: any
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treating the whole doc as any (temporarily) allows us walk the whole document like a tree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can cast some nodes back to known interfaces when we need to, like XtpTyped

this.validateTypedInterface(node)

if (node && typeof node === 'object') {
// i don't think we need to validate array children
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe arrays are effectively terminal in our document. this could change though.


recordError(msg: string) {
this.errors.push(
new ValidationError(msg, this.getLocation())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling recordError uses our current location

Comment on lines +97 to +99
* Validates that a node conforms to the rules of
* the XtpTyped interface. Validates what we can't
* catch in JSON Schema validation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some more rules in here i haven't encoded yet. Either way, this is where we add the custom validation on the raw type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to check for ref integrity here as well. Just make sure that all refs point to a valid target.

src/parser.ts Outdated
Comment on lines 128 to 137
// here we are adding some extra constraints on the value type
// we can relax these later when we can ensure we can cast these properly
this.location.push('items')
if (prop.items.items) {
this.recordError("Arrays are currently not supported as element types of arrays")
}
if (prop.items.additionalProperties) {
this.recordError("Maps are currently not supported as element types of arrays")
}
this.location.pop()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below

}
}

function detectCircularReference(schema: Schema, visited: Set<string> = new Set()): ValidationError | null {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this but have not fixed this yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to block circular refs just get clever about how to process it

Copy link

@chrisdickinson chrisdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments about splitting out the concept of field descriptors from types & some fretting about MapType but overall this is a great improvement! Normalizing the schema down from the authored format feels like cutting with the grain of the wood.

return p?.xtpType?.kind === "boolean"
}
function isMap(p: XtpTyped): boolean {
return p?.xtpType?.kind === "map"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had something like:

type: object
properties:
  foo:
    type: string
    nullable: false
  bar:
    type: number
additionalProperties:
  type: 'string'

Would that be a map or an object? (Does this change for patternProperties?)

(Edit: I get into semi-structured objects a little later in the review at MapType!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be caught in the validation step as these two concepts should not be mixable

Copy link
Contributor

@mhmd-azeez mhmd-azeez Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, currently we don't support that. additionalProperties and properties are mutually exclusive

Comment on lines +77 to +78
for (const key in node) {
if (Object.prototype.hasOwnProperty.call(node, key)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golf: for (const [key, child] of Object.entries(node)) { should work here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace the hasOwnProperty call too? is Object.entries() shallow? Will take a look

src/parser.ts Outdated
if (prop.additionalProperties) {
this.validateTypedInterface(prop.additionalProperties)

// here we are adding some extra constraints on the value type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ha – so to confirm my understanding, is it the trickiness of writing recursive casting code that's steering us away from supporting recursive types?

}
}

export class MapType implements XtpNormalizedType {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapType is raising some hairs – there's this concept of a semi-structured object in JSONSchema, where some properties are defined with specific types and other properties are allowed according to a pattern or a catch-all (additionalProperties or patternProperties). In Rust, at least, that's supported using #[serde(flatten)] – the additional properties can be flattened into a HashMap<String, T> on the parent object. Golang looks like it supports something similar with map[string]*json.RawMessage, though it requires some additional elbow grease.

I suppose my worry is that, while we can strictly disallow maps and structs now, if we codify MapType as distinct from an ObjectType it'll be hard to introduce that functionality later. (This kind of gets back to representing properties on an object as an array of FieldDescriptor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map type should be used for typing what are now untyped objects. The distinction b/w an object and a map is dynamic keys, whereas object are for when you know the keys and types ahead of time.

A good example would be http headers i think. Today you can just say that it's an untyped object, but tomorrow you can say that it's a Map<String, String>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah – the trick is that JSONSchema lets you represent both typed and dynamic keys in a single object. HTTP headers could be a useful example of a semi-structured object, too – you might say something like:

ResponseHeaders:
  type: object
  properties:
    etag:
      type: string
  additionalProperties:
    type: string

and in rust you might produce:

#[derive(Deserialize)]
struct ResponseHeaders {
  etag: String

  #[serde(flatten)]
  additional_properties: HashMap<String, String>
}

or in TS:

interface ResponseHeaders {
  etag: string
  [additionalProperties: string]: string
}

(This not to say we need to enable this now, but if we're looking to track OpenAPI support it'd be useful to leave the door open to adding this down the line. I worry that adding a distinct Map type makes it harder to add support later, vs. pulling that info into the FieldDescriptor.)

constructor(doc: V1Schema) {
this.doc = doc as any
this.errors = []
this.location = ['#']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot – it's an elegant abstraction!

src/types.ts Outdated
function cons(t: XtpNormalizedType, opts?: XtpTypeOpts): XtpNormalizedType {
// default them to false
t.nullable = (opts?.nullable === undefined) ? false : opts.nullable
t.required = (opts?.required === undefined) ? false : opts.required

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as you know!) I suspect this should probably get split into PropertyDescriptor and NormalizedType, where things like required, field-level description, name (or namePattern) live on the PropertyDescriptor and things like kind, nullable, type-level description and other type info can live on the NormalizedType. This gives you a distinction between "field-level" properties when doing code generation and "type-level" properties. On the codegen side you'd end up iterating over a list of PropertyDescriptor objects with type information inside.

(This gets at a pretty big impedance mismatch between JSONSchema and the sort of codegen XTP does – JSONSchema doesn't need to name types, but having a name for any sort of rich type is required in most languages we target. This might be a good place to generate an "inferred name" for any type that's otherwise unnamed by the schema as authored?)

A maybe-more complete sketch:

interface PropertyDescriptor {
  value: XtpNormalizedType,
  description?: string
  codeExamples: string[]
}

interface NamedPropertyDescriptor extends PropertyDescriptor {
  name: string
  required: boolean
}

interface PatternPropertyDescriptor extends PropertyDescriptor {
  pattern: RegExp
}

// pretty much the same as before, sans "required"
interface XtpTypeOpts {
  nullable?: boolean;
}

// ditto!
export interface XtpNormalizedType extends XtpTypeOpts {
  kind: XtpNormalizedKind;
}

// a little different: note the split between `name` and `inferredName` and the list of `properties`
class ObjectType implements XtpNormalizedType {
  kind: XtpNormalizedKind = 'object';
  // if the object appears in a "naming" position (e.g., listed in components/schemas)
  name?: string;

  // Type-level docs and code examples can go here.
  description: string;
  codeExamples: string;

  // if name is not available we infer a name by building up a path from the last named object
  inferredName: string;
  properties: PropertyDescriptor[]
}

bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Oct 30, 2024
Copy link
Contributor

@mhmd-azeez mhmd-azeez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, the only thing left I think is adding warnings to the result

super(message);
Object.setPrototypeOf(this, ValidationError.prototype);
}
export class ValidationError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this!

constructor(doc: V1Schema) {
this.doc = doc as any
this.errors = []
this.location = ['#']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful!

@bhelx
Copy link
Contributor Author

bhelx commented Nov 1, 2024

Going to merge and push since it should be backwards compatible

@bhelx bhelx merged commit 70905c7 into main Nov 1, 2024
3 checks passed
@bhelx bhelx deleted the refactor-bindgen branch November 1, 2024 15:17
bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Nov 1, 2024
bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Nov 1, 2024
bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 1, 2024
bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 4, 2024
bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 4, 2024
bhelx added a commit to dylibso/xtp-rust-bindgen that referenced this pull request Nov 4, 2024
* Refactor: see dylibso/xtp-bindgen#18
* Support nullable vs required
bhelx added a commit to dylibso/xtp-go-bindgen that referenced this pull request Nov 7, 2024
bhelx added a commit to dylibso/xtp-typescript-bindgen that referenced this pull request Nov 11, 2024
* Refactor Bindgen

dylibso/xtp-bindgen#18

* get XtpTyped from the project

* bump to latest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants