Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON output support to pg-schema-diff plan. #175

Merged
merged 3 commits into from
Sep 21, 2024

Conversation

tonycosentini
Copy link
Contributor

Description

This adds support for different output formats when running the plan subcommand. Currently it only supports pretty printing (which is what already exists today) or JSON.

I rarely/never write golang so this PR was more or less a first attempt to get a discussion going, happy to refactor as needed.

Motivation

I'm working on a tool written in Python that is using the pg-schema-diff to generate schema plans. It would be great to have a structured output option so I can just parse the output in Python.

Alternatively I can build some kind of bindings for this library, that would require waaay more overhead than just being able to parse the JSON output.

I think this would make the tool much more versatile.

Testing

Just manual testing. I'm happy to add test coverage, but I didn't see any other CLI tests (besides some options parsing tests) so I wasn't really sure where to start.

@CLAassistant
Copy link

CLAassistant commented Sep 18, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@bplunkett-stripe bplunkett-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really solid! A couple tiny nits. Thanks for implementing this :)

*e = outputFormat(v)
return nil
default:
return errors.New(`must be one of "pretty" or "json"`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: include the unknown value the user used. Also let's group with the other plan flag structs

Copy link
Contributor Author

@tonycosentini tonycosentini Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Cobra prepends this. I added the value the user used and it came out like this:

Error: invalid argument "test" for "--output-format" flag: must be one of "pretty" or "json", value passed in was "test"

I'm going to leave it out, but if you think it's useful because the error might be used outside of Cobra let me know and I can add it back in.

Here's how it looks by default:

Error: invalid argument "test" for "--output-format" flag: must be one of "pretty" or "json"

@@ -139,6 +172,8 @@ func createPlanFlags(cmd *cobra.Command) *planFlags {
),
)

cmd.Flags().Var(&flags.outputFormat, "output-format", "Change the output format for what is printed. Defaults to pretty-printed human-readable output. (options: pretty, json)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: set default value to pretty and make a required param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a default value now, but it has to be on the struct itself, when using Var there is no way to set a default.

I also didn't mark it as required because when it's marked as required via MarkFlagRequired the CLI will always require something to be passed in, even though the struct instance already has it set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

fmt.Printf("\n%s\n", header("Generated plan"))
fmt.Println(planToPrettyS(plan))

if planFlags.outputFormat == "" || planFlags.outputFormat == outputFormatPretty {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not support an empty string for output format (once the param is default and required). Let's also error out if the enum value is not known, i.e., if neither if condition matches, then return an error.

Alternatively (up to you!), you could add to the output format type an additional property called print:

{
    name string
    print func(plan)
}
....
...
planFlags.outputFormat.print(plan)

Then we wouldn't need any sort of conditional switchhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this as suggested - now each output format has a function to convert the plan to an output string.

Thanks for the suggestion, it's definitely much cleaner.

@tonycosentini
Copy link
Contributor Author

@bplunkett-stripe I think I addressed everything - apologies if something is off, just let me know and I can sort it out.

Also thanks for jumping on this, much appreciated.

@@ -78,6 +77,11 @@ type (
targetDatabaseDSN string
}

outputFormat struct {
identifier string
convertToOutputString func(diff.Plan) string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -139,6 +172,8 @@ func createPlanFlags(cmd *cobra.Command) *planFlags {
),
)

cmd.Flags().Var(&flags.outputFormat, "output-format", "Change the output format for what is printed. Defaults to pretty-printed human-readable output. (options: pretty, json)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@bplunkett-stripe
Copy link
Collaborator

@tonycosentini before merging, could you paste some example outputs of a "pretty" plan and a "json" plan. Just want to make sure those look okay! (Currently CLI tests are just by-hand right now, so the current practice to just to include some example runs in your PR)!

@tonycosentini
Copy link
Contributor Author

tonycosentini commented Sep 20, 2024

@bplunkett-stripe here are some examples

Normal output without specifying any format (defaults to pretty):

➜  pg-schema-diff git:(json_output) pg-schema-diff plan --dsn postgresql://postgres@localhost/empty --schema-dir ../test-schema.sql 
################################ Generated plan ################################
1. CREATE SCHEMA "sample_code_review_schema";
	-- Statement Timeout: 3s

2. CREATE TABLE "sample_code_review_schema"."test" (
	"id" bigint NOT NULL,
	"created_at" timestamp without time zone,
	"updated_at" timestamp without time zone,
	"product" character varying(255) COLLATE "pg_catalog"."default",
	"cost" numeric,
	"user_name" character varying(100) COLLATE "pg_catalog"."default",
	"email" character varying(255) COLLATE "pg_catalog"."default"
);
	-- Statement Timeout: 3s

3. CREATE UNIQUE INDEX CONCURRENTLY test_pkey ON sample_code_review_schema.test USING btree (id);
	-- Statement Timeout: 20m0s
	-- Lock Timeout: 3s

4. ALTER TABLE "sample_code_review_schema"."test" ADD CONSTRAINT "test_pkey" PRIMARY KEY USING INDEX "test_pkey";
	-- Statement Timeout: 3s

5. CREATE INDEX CONCURRENTLY test_product_idx ON sample_code_review_schema.test USING btree (product) WHERE (cost > (100)::numeric);
	-- Statement Timeout: 20m0s
	-- Lock Timeout: 3s

6. DROP TABLE "public"."test";
	-- Statement Timeout: 20m0s
	-- Lock Timeout: 3s
	-- Hazard DELETES_DATA: Deletes all rows in the table (and the table itself)

Passing pretty as output format:

➜  pg-schema-diff git:(json_output) pg-schema-diff plan --dsn postgresql://postgres@localhost/empty --schema-dir ../test-schema.sql --output-format pretty
################################ Generated plan ################################
1. CREATE SCHEMA "sample_code_review_schema";
	-- Statement Timeout: 3s

2. CREATE TABLE "sample_code_review_schema"."test" (
	"id" bigint NOT NULL,
	"created_at" timestamp without time zone,
	"updated_at" timestamp without time zone,
	"product" character varying(255) COLLATE "pg_catalog"."default",
	"cost" numeric,
	"user_name" character varying(100) COLLATE "pg_catalog"."default",
	"email" character varying(255) COLLATE "pg_catalog"."default"
);
	-- Statement Timeout: 3s

3. CREATE UNIQUE INDEX CONCURRENTLY test_pkey ON sample_code_review_schema.test USING btree (id);
	-- Statement Timeout: 20m0s
	-- Lock Timeout: 3s

4. ALTER TABLE "sample_code_review_schema"."test" ADD CONSTRAINT "test_pkey" PRIMARY KEY USING INDEX "test_pkey";
	-- Statement Timeout: 3s

5. CREATE INDEX CONCURRENTLY test_product_idx ON sample_code_review_schema.test USING btree (product) WHERE (cost > (100)::numeric);
	-- Statement Timeout: 20m0s
	-- Lock Timeout: 3s

6. DROP TABLE "public"."test";
	-- Statement Timeout: 20m0s
	-- Lock Timeout: 3s
	-- Hazard DELETES_DATA: Deletes all rows in the table (and the table itself)

JSON output:

➜  pg-schema-diff git:(json_output) pg-schema-diff plan --dsn postgresql://postgres@localhost/empty --schema-dir ../test-schema.sql --output-format json
{
  "statements": [
    {
      "ddl": "CREATE SCHEMA \"sample_code_review_schema\"",
      "timeout_ms": 3000,
      "lock_timeout_ms": 3000,
      "hazards": null
    },
    {
      "ddl": "CREATE TABLE \"sample_code_review_schema\".\"test\" (\n\t\"id\" bigint NOT NULL,\n\t\"created_at\" timestamp without time zone,\n\t\"updated_at\" timestamp without time zone,\n\t\"product\" character varying(255) COLLATE \"pg_catalog\".\"default\",\n\t\"cost\" numeric,\n\t\"user_name\" character varying(100) COLLATE \"pg_catalog\".\"default\",\n\t\"email\" character varying(255) COLLATE \"pg_catalog\".\"default\"\n)",
      "timeout_ms": 3000,
      "lock_timeout_ms": 3000,
      "hazards": null
    },
    {
      "ddl": "CREATE UNIQUE INDEX CONCURRENTLY test_pkey ON sample_code_review_schema.test USING btree (id)",
      "timeout_ms": 1200000,
      "lock_timeout_ms": 3000,
      "hazards": null
    },
    {
      "ddl": "ALTER TABLE \"sample_code_review_schema\".\"test\" ADD CONSTRAINT \"test_pkey\" PRIMARY KEY USING INDEX \"test_pkey\"",
      "timeout_ms": 3000,
      "lock_timeout_ms": 3000,
      "hazards": null
    },
    {
      "ddl": "CREATE INDEX CONCURRENTLY test_product_idx ON sample_code_review_schema.test USING btree (product) WHERE (cost \u003e (100)::numeric)",
      "timeout_ms": 1200000,
      "lock_timeout_ms": 3000,
      "hazards": null
    },
    {
      "ddl": "DROP TABLE \"public\".\"test\"",
      "timeout_ms": 1200000,
      "lock_timeout_ms": 3000,
      "hazards": [
        {
          "type": "DELETES_DATA",
          "message": "Deletes all rows in the table (and the table itself)"
        }
      ]
    }
  ],
  "current_schema_hash": "620ef80a9065c439"
}

Invalid output format:

➜  pg-schema-diff git:(json_output) pg-schema-diff plan --dsn postgresql://postgres@localhost/empty --schema-dir ../test-schema.sql --output-format invalid
Error: invalid argument "invalid" for "--output-format" flag: must be one of "pretty" or "json"
Usage:
  pg-schema-diff plan [flags]

Aliases:
  plan, diff

Flags:
      --data-pack-new-tables                     If set, will data pack new tables in the plan to minimize table size (re-arranges columns). (default true)
      --disable-plan-validation                  If set, will disable plan validation. Plan validation runs the migration against a temporarydatabase with an identical schema to the original, asserting that the generated plan actually migrates the schema to the desired target.
      --dsn string                               Connection string for the database (DB password can be specified through PGPASSWORD environment variable)
      --exclude-schema stringArray               Exclude the specified schema in the plan
  -h, --help                                     help for plan
      --include-schema stringArray               Include the specified schema in the plan
  -s, --insert-statement stringArray             'index=<index> statement="<statement>" timeout=<duration> lock_timeout=<duration>' values. Will insert the statement at the index in the generated plan. This follows normal insert semantics. Example: -s 'index=1 statement="SELECT pg_sleep(5)" timeout=5s lock_timeout=1s'
  -l, --lock-timeout-modifier stringArray        list of 'pattern="<regex>" timeout=<duration>', where if a statement matches the regex, the statement will have the target lock timeout. If multiple regexes match, the latest regex will take priority. Example: -t 'pattern="CREATE TABLE" timeout=5m'
      --output-format outputFormat               Change the output format for what is printed. Defaults to pretty-printed human-readable output. (options: pretty, json) (default pretty)
      --schema-dir stringArray                   Directory of .SQL files to use as the schema source (can be multiple). Use to generate a diff between the target database and the schema in this directory.
      --schema-source-dsn string                 DSN for the database to use as the schema source. Use to generate a diff between the target database and the schema in this database.
  -t, --statement-timeout-modifier stringArray   list of 'pattern="<regex>" timeout=<duration>', where if a statement matches the regex, the statement will have the target statement timeout. If multiple regexes match, the latest regex will take priority. Example: -t 'pattern="CREATE TABLE" timeout=5m'

I kind of think the casing on the JSON keys is a bit strange, I think ideally it would be snake case or camel case, I rarely (never?) see JSON with capitalized key names. However it seems like this isn't super easy to tweak with the built-in JSON library, I'm curious if you think it's worth reformatting it. edit: I just pushed another commit that cleans up the formatting a bit - the keys are now all lowercase + the times are converted to milliseconds, before it was nanoseconds which was a bit confusing. The example has been updated.

@bplunkett-stripe bplunkett-stripe merged commit f036cb2 into stripe:main Sep 21, 2024
7 checks passed
@bplunkett-stripe
Copy link
Collaborator

Sorry I hit the merge button for you! You don't have write permissions on the repo, so you can't hit the merge button unfortunately. The commit is authored by you though! 🎉

This is a very nice improvement to the repo! Thanks for making it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants