From 52dacf9888a66b2009e652ed7954ec0381496518 Mon Sep 17 00:00:00 2001 From: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com> Date: Mon, 22 Aug 2022 10:23:12 -0500 Subject: [PATCH] [BACKPORT] 418 to 1.0.latest (#426) * init pr for backport of 418 to 1.0.latest * add changelog entry * add ref to 0.21 in changelog history * expand out ref to past 1.0.0 * remove excess spacing --- .changes/0.0.0.md | 6 + .changes/1.0.0.md | 20 ++ .changes/1.0.1.md | 11 ++ .changes/README.md | 3 + .changes/header.tpl.md | 6 + .changes/unreleased/.gitkeep | 0 .../unreleased/Features-20220810-133356.yaml | 7 + .changie.yaml | 62 ++++++ .github/pull_request_template.md | 2 +- .github/workflows/bot-changelog.yml | 61 ++++++ .github/workflows/changelog-existence.yml | 41 ++++ CHANGELOG.md | 183 ++---------------- CONTRIBUTING.md | 111 +++++++++++ 13 files changed, 347 insertions(+), 166 deletions(-) create mode 100644 .changes/0.0.0.md create mode 100644 .changes/1.0.0.md create mode 100644 .changes/1.0.1.md create mode 100644 .changes/README.md create mode 100644 .changes/header.tpl.md create mode 100644 .changes/unreleased/.gitkeep create mode 100644 .changes/unreleased/Features-20220810-133356.yaml create mode 100644 .changie.yaml create mode 100644 .github/workflows/bot-changelog.yml create mode 100644 .github/workflows/changelog-existence.yml create mode 100644 CONTRIBUTING.md diff --git a/.changes/0.0.0.md b/.changes/0.0.0.md new file mode 100644 index 000000000..7e4e1bbb6 --- /dev/null +++ b/.changes/0.0.0.md @@ -0,0 +1,6 @@ +## Previous Releases +For information on prior releases of dbt-spark prior to 1.0.0 please see +- [0.21](https://github.com/dbt-labs/dbt-spark/blob/0.21.latest/CHANGELOG.md) +- [0.20](https://github.com/dbt-labs/dbt-spark/blob/0.20.latest/CHANGELOG.md) +- [0.19 and earlier](https://github.com/dbt-labs/dbt-spark/blob/0.19.latest/CHANGELOG.md) + diff --git a/.changes/1.0.0.md b/.changes/1.0.0.md new file mode 100644 index 000000000..1a48789e8 --- /dev/null +++ b/.changes/1.0.0.md @@ -0,0 +1,20 @@ +## dbt-spark 1.0.0 - December 3, 2021 + +### Features +- Add support for Apache Hudi (hudi file format) which supports incremental merge strategies ([#187](https://github.com/dbt-labs/dbt-spark/issues/187), [#210](https://github.com/dbt-labs/dbt-spark/pull/210)) + +### Fixes +- Incremental materialization corrected to respect `full_refresh` config, by using `should_full_refresh()` macro ([#260](https://github.com/dbt-labs/dbt-spark/issues/260), [#262](https://github.com/dbt-labs/dbt-spark/pull/262/)) + +### Under the hood +- Refactor seed macros: remove duplicated code from dbt-core, and provide clearer logging of SQL parameters that differ by connection method ([#249](https://github.com/dbt-labs/dbt-spark/issues/249), [#250](https://github.com/dbt-labs/dbt-snowflake/pull/250)) +- Replace `sample_profiles.yml` with `profile_template.yml`, for use with new `dbt init` ([#247](https://github.com/dbt-labs/dbt-spark/pull/247)) +- Remove official support for python 3.6, which is reaching end of life on December 23, 2021 ([dbt-core#4134](https://github.com/dbt-labs/dbt-core/issues/4134), [#253](https://github.com/dbt-labs/dbt-snowflake/pull/253)) +- Add support for structured logging ([#251](https://github.com/dbt-labs/dbt-spark/pull/251)) + +### Contributors +- [@grindheim](https://github.com/grindheim) ([#262](https://github.com/dbt-labs/dbt-spark/pull/262/)) +- [@vingov](https://github.com/vingov) ([#210](https://github.com/dbt-labs/dbt-spark/pull/210)) + + + diff --git a/.changes/1.0.1.md b/.changes/1.0.1.md new file mode 100644 index 000000000..a76ca4b3c --- /dev/null +++ b/.changes/1.0.1.md @@ -0,0 +1,11 @@ +## dbt-spark 1.0.1 - April 19, 2022 + +### Fixes +- Closes the connection properly ([#280](https://github.com/dbt-labs/dbt-spark/issues/280), [#285](https://github.com/dbt-labs/dbt-spark/pull/285)) +- Make internal macros use macro dispatch to be overridable in child adapters ([#319](https://github.com/dbt-labs/dbt-spark/issues/319), [#320](https://github.com/dbt-labs/dbt-spark/pull/320)) + +### Under the hood +- Configure insert_overwrite models to use parquet ([#301](https://github.com/dbt-labs/dbt-spark/issues/301)) + +### Contributors +- [@ueshin](https://github.com/ueshin) ([#285](https://github.com/dbt-labs/dbt-spark/pull/285), [#320](https://github.com/dbt-labs/dbt-spark/pull/320)) \ No newline at end of file diff --git a/.changes/README.md b/.changes/README.md new file mode 100644 index 000000000..dc6106dfe --- /dev/null +++ b/.changes/README.md @@ -0,0 +1,3 @@ +# CHANGELOG + +To view information about the changelog operation we suggest reading this [README](https://github.com/dbt-labs/dbt-spark/blob/main/.changes/README.md) found in `dbt-spark`. diff --git a/.changes/header.tpl.md b/.changes/header.tpl.md new file mode 100644 index 000000000..251ea5d51 --- /dev/null +++ b/.changes/header.tpl.md @@ -0,0 +1,6 @@ +# dbt-spark Changelog + +- This file provides a full account of all changes to `dbt-spark`. +- Changes are listed under the (pre)release in which they first appear. Subsequent releases include changes from previous releases. +- "Breaking changes" listed under a version may require action from end users or external maintainers when upgrading to that version. +- Do not edit this file directly. This file is auto-generated using [changie](https://github.com/miniscruff/changie). For details on how to document a change, see [the contributing guide](https://github.com/dbt-labs/dbt-spark/blob/main/CONTRIBUTING.md#adding-changelog-entry) diff --git a/.changes/unreleased/.gitkeep b/.changes/unreleased/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/.changes/unreleased/Features-20220810-133356.yaml b/.changes/unreleased/Features-20220810-133356.yaml new file mode 100644 index 000000000..e9468a15f --- /dev/null +++ b/.changes/unreleased/Features-20220810-133356.yaml @@ -0,0 +1,7 @@ +kind: Features +body: backport changie to 1.0.latest +time: 2022-08-10T13:33:56.992461-05:00 +custom: + Author: mcknight-42 + Issue: "417" + PR: "426" diff --git a/.changie.yaml b/.changie.yaml new file mode 100644 index 000000000..f5800f324 --- /dev/null +++ b/.changie.yaml @@ -0,0 +1,62 @@ +changesDir: .changes +unreleasedDir: unreleased +headerPath: header.tpl.md +versionHeaderPath: "" +changelogPath: CHANGELOG.md +versionExt: md +versionFormat: '## dbt-spark {{.Version}} - {{.Time.Format "January 02, 2006"}}' +kindFormat: '### {{.Kind}}' +changeFormat: '- {{.Body}} ([#{{.Custom.Issue}}](https://github.com/dbt-labs/dbt-spark/issues/{{.Custom.Issue}}), [#{{.Custom.PR}}](https://github.com/dbt-labs/dbt-spark/pull/{{.Custom.PR}}))' +kinds: +- label: Breaking Changes +- label: Features +- label: Fixes +- label: Under the Hood +- label: Dependencies + changeFormat: '- {{.Body}} ({{if ne .Custom.Issue ""}}[#{{.Custom.Issue}}](https://github.com/dbt-labs/dbt-spark/issues/{{.Custom.Issue}}), {{end}}[#{{.Custom.PR}}](https://github.com/dbt-labs/dbt-spark/pull/{{.Custom.PR}}))' +- label: Security + changeFormat: '- {{.Body}} ({{if ne .Custom.Issue ""}}[#{{.Custom.Issue}}](https://github.com/dbt-labs/dbt-spark/issues/{{.Custom.Issue}}), {{end}}[#{{.Custom.PR}}](https://github.com/dbt-labs/dbt-spark/pull/{{.Custom.PR}}))' +custom: +- key: Author + label: GitHub Username(s) (separated by a single space if multiple) + type: string + minLength: 3 +- key: Issue + label: GitHub Issue Number + type: int + minLength: 4 +- key: PR + label: GitHub Pull Request Number + type: int + minLength: 4 +footerFormat: | + {{- $contributorDict := dict }} + {{- /* any names added to this list should be all lowercase for later matching purposes */}} + {{- $core_team := list "emmyoop" "nathaniel-may" "gshank" "leahwicz" "chenyulinx" "stu-k" "iknox-fa" "versusfacit" "mcknight-42" "jtcohen6" "dependabot[bot]" "snyk-bot" }} + {{- range $change := .Changes }} + {{- $authorList := splitList " " $change.Custom.Author }} + {{- /* loop through all authors for a PR */}} + {{- range $author := $authorList }} + {{- $authorLower := lower $author }} + {{- /* we only want to include non-core team contributors */}} + {{- if not (has $authorLower $core_team)}} + {{- $pr := $change.Custom.PR }} + {{- /* check if this contributor has other PRs associated with them already */}} + {{- if hasKey $contributorDict $author }} + {{- $prList := get $contributorDict $author }} + {{- $prList = append $prList $pr }} + {{- $contributorDict := set $contributorDict $author $prList }} + {{- else }} + {{- $prList := list $change.Custom.PR }} + {{- $contributorDict := set $contributorDict $author $prList }} + {{- end }} + {{- end}} + {{- end}} + {{- end }} + {{- /* no indentation here for formatting so the final markdown doesn't have unneeded indentations */}} + {{- if $contributorDict}} + ### Contributors + {{- range $k,$v := $contributorDict }} + - [@{{$k}}](https://github.com/{{$k}}) ({{ range $index, $element := $v }}{{if $index}}, {{end}}[#{{$element}}](https://github.com/dbt-labs/dbt-spark/pull/{{$element}}){{end}}) + {{- end }} + {{- end }} diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 60e12779b..c4a5c53b4 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -18,4 +18,4 @@ resolves # - [ ] I have signed the [CLA](https://docs.getdbt.com/docs/contributor-license-agreements) - [ ] I have run this code in development and it appears to resolve the stated issue - [ ] This PR includes tests, or tests are not required/relevant for this PR -- [ ] I have updated the `CHANGELOG.md` and added information about my change to the "dbt-spark next" section. \ No newline at end of file +- [ ] I have run `changie new` to [create a changelog entry](https://github.com/dbt-labs/dbt-spark/blob/main/CONTRIBUTING.md#Adding-CHANGELOG-Entry) diff --git a/.github/workflows/bot-changelog.yml b/.github/workflows/bot-changelog.yml new file mode 100644 index 000000000..d8056efe4 --- /dev/null +++ b/.github/workflows/bot-changelog.yml @@ -0,0 +1,61 @@ +# **what?** +# When bots create a PR, this action will add a corresponding changie yaml file to that +# PR when a specific label is added. +# +# The file is created off a template: +# +# kind: +# body: +# time: +# custom: +# Author: +# Issue: 4904 +# PR: +# +# **why?** +# Automate changelog generation for more visability with automated bot PRs. +# +# **when?** +# Once a PR is created, label should be added to PR before or after creation. You can also +# manually trigger this by adding the appropriate label at any time. +# +# **how to add another bot?** +# Add the label and changie kind to the include matrix. That's it! +# + +name: Bot Changelog + +on: + pull_request: + # catch when the PR is opened with the label or when the label is added + types: [opened, labeled] + +permissions: + contents: write + pull-requests: read + +jobs: + generate_changelog: + strategy: + matrix: + include: + - label: "dependencies" + changie_kind: "Dependency" + - label: "snyk" + changie_kind: "Security" + runs-on: ubuntu-latest + + steps: + + - name: Create and commit changelog on bot PR + if: "contains(github.event.pull_request.labels.*.name, ${{ matrix.label }})" + id: bot_changelog + uses: emmyoop/changie_bot@v1.0 + with: + GITHUB_TOKEN: ${{ secrets.FISHTOWN_BOT_PAT }} + commit_author_name: "Github Build Bot" + commit_author_email: "" + commit_message: "Add automated changelog yaml from template for bot PR" + changie_kind: ${{ matrix.changie_kind }} + label: ${{ matrix.label }} + custom_changelog_string: "custom:\n Author: ${{ github.event.pull_request.user.login }}\n Issue: 417\n PR: ${{ github.event.pull_request.number }}\n" diff --git a/.github/workflows/changelog-existence.yml b/.github/workflows/changelog-existence.yml new file mode 100644 index 000000000..6e51e8afc --- /dev/null +++ b/.github/workflows/changelog-existence.yml @@ -0,0 +1,41 @@ +# **what?** +# Checks that a file has been committed under the /.changes directory +# as a new CHANGELOG entry. Cannot check for a specific filename as +# it is dynamically generated by change type and timestamp. +# This workflow should not require any secrets since it runs for PRs +# from forked repos. +# By default, secrets are not passed to workflows running from +# a forked repo. + +# **why?** +# Ensure code change gets reflected in the CHANGELOG. + +# **when?** +# This will run for all PRs going into main and *.latest. It will +# run when they are opened, reopened, when any label is added or removed +# and when new code is pushed to the branch. The action will then get +# skipped if the 'Skip Changelog' label is present is any of the labels. + +name: Check Changelog Entry + +on: + pull_request: + types: [opened, reopened, labeled, unlabeled, synchronize] + workflow_dispatch: + +defaults: + run: + shell: bash + +permissions: + contents: read + pull-requests: write + + +jobs: + changelog: + uses: dbt-labs/actions/.github/workflows/changelog-existence.yml@main + with: + changelog_comment: 'Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the [dbt-spark contributing guide](https://github.com/dbt-labs/dbt-spark/blob/main/CONTRIBUTING.MD).' + skip_label: 'Skip Changelog' + secrets: inherit # this is only acceptable because we own the action we're calling diff --git a/CHANGELOG.md b/CHANGELOG.md index 702e7561a..147293f9e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,11 @@ -## dbt-spark 1.0.1 (April 19, 2022) +# dbt-spark Changelog -## dbt-spark 1.0.1rc1 (April 6, 2022) +- This file provides a full account of all changes to `dbt-spark`. +- Changes are listed under the (pre)release in which they first appear. Subsequent releases include changes from previous releases. +- "Breaking changes" listed under a version may require action from end users or external maintainers when upgrading to that version. +- Do not edit this file directly. This file is auto-generated using [changie](https://github.com/miniscruff/changie). For details on how to document a change, see [the contributing guide](https://github.com/dbt-labs/dbt-spark/blob/main/CONTRIBUTING.md#adding-changelog-entry) + +## dbt-spark 1.0.1 - April 19, 2022 ### Fixes - Closes the connection properly ([#280](https://github.com/dbt-labs/dbt-spark/issues/280), [#285](https://github.com/dbt-labs/dbt-spark/pull/285)) @@ -12,179 +17,27 @@ ### Contributors - [@ueshin](https://github.com/ueshin) ([#285](https://github.com/dbt-labs/dbt-spark/pull/285), [#320](https://github.com/dbt-labs/dbt-spark/pull/320)) -## dbt-spark 1.0.0 (December 3, 2021) - -### Fixes -- Incremental materialization corrected to respect `full_refresh` config, by using `should_full_refresh()` macro ([#260](https://github.com/dbt-labs/dbt-spark/issues/260), [#262](https://github.com/dbt-labs/dbt-spark/pull/262/)) - -### Contributors -- [@grindheim](https://github.com/grindheim) ([#262](https://github.com/dbt-labs/dbt-spark/pull/262/)) - -## dbt-spark 1.0.0rc2 (November 24, 2021) +## dbt-spark 1.0.0 - December 3, 2021 ### Features - Add support for Apache Hudi (hudi file format) which supports incremental merge strategies ([#187](https://github.com/dbt-labs/dbt-spark/issues/187), [#210](https://github.com/dbt-labs/dbt-spark/pull/210)) +### Fixes +- Incremental materialization corrected to respect `full_refresh` config, by using `should_full_refresh()` macro ([#260](https://github.com/dbt-labs/dbt-spark/issues/260), [#262](https://github.com/dbt-labs/dbt-spark/pull/262/)) + ### Under the hood - Refactor seed macros: remove duplicated code from dbt-core, and provide clearer logging of SQL parameters that differ by connection method ([#249](https://github.com/dbt-labs/dbt-spark/issues/249), [#250](https://github.com/dbt-labs/dbt-snowflake/pull/250)) - Replace `sample_profiles.yml` with `profile_template.yml`, for use with new `dbt init` ([#247](https://github.com/dbt-labs/dbt-spark/pull/247)) - -### Contributors -- [@vingov](https://github.com/vingov) ([#210](https://github.com/dbt-labs/dbt-spark/pull/210)) - -## dbt-spark 1.0.0rc1 (November 10, 2021) - -### Under the hood - Remove official support for python 3.6, which is reaching end of life on December 23, 2021 ([dbt-core#4134](https://github.com/dbt-labs/dbt-core/issues/4134), [#253](https://github.com/dbt-labs/dbt-snowflake/pull/253)) - Add support for structured logging ([#251](https://github.com/dbt-labs/dbt-spark/pull/251)) -## dbt-spark 0.21.1 (Release TBD) - -## dbt-spark 0.21.1rc1 (November 3, 2021) - -### Fixes -- Fix `--store-failures` for tests, by suppressing irrelevant error in `comment_clause()` macro ([#232](https://github.com/dbt-labs/dbt-spark/issues/232), [#233](https://github.com/dbt-labs/dbt-spark/pull/233)) -- Add support for `on_schema_change` config in incremental models: `ignore`, `fail`, `append_new_columns`. For `sync_all_columns`, removing columns is not supported by Apache Spark or Delta Lake ([#198](https://github.com/dbt-labs/dbt-spark/issues/198), [#226](https://github.com/dbt-labs/dbt-spark/issues/226), [#229](https://github.com/dbt-labs/dbt-spark/pull/229)) -- Add `persist_docs` call to incremental model ([#224](https://github.com/dbt-labs/dbt-spark/issues/224), [#234](https://github.com/dbt-labs/dbt-spark/pull/234)) - -### Contributors -- [@binhnefits](https://github.com/binhnefits) ([#234](https://github.com/dbt-labs/dbt-spark/pull/234)) - -## dbt-spark 0.21.0 (October 4, 2021) - -### Fixes -- Enhanced get_columns_in_relation method to handle a bug in open source deltalake which doesnt return schema details in `show table extended in databasename like '*'` query output. This impacts dbt snapshots if file format is open source deltalake ([#207](https://github.com/dbt-labs/dbt-spark/pull/207)) -- Parse properly columns when there are struct fields to avoid considering inner fields: Issue ([#202](https://github.com/dbt-labs/dbt-spark/issues/202)) - -### Under the hood -- Add `unique_field` to better understand adapter adoption in anonymous usage tracking ([#211](https://github.com/dbt-labs/dbt-spark/pull/211)) - -### Contributors -- [@harryharanb](https://github.com/harryharanb) ([#207](https://github.com/dbt-labs/dbt-spark/pull/207)) -- [@SCouto](https://github.com/Scouto) ([#204](https://github.com/dbt-labs/dbt-spark/pull/204)) - -## dbt-spark 0.21.0b2 (August 20, 2021) - -### Fixes -- Add pyodbc import error message to dbt.exceptions.RuntimeException to get more detailed information when running `dbt debug` ([#192](https://github.com/dbt-labs/dbt-spark/pull/192)) -- Add support for ODBC Server Side Parameters, allowing options that need to be set with the `SET` statement to be used ([#201](https://github.com/dbt-labs/dbt-spark/pull/201)) -- Add `retry_all` configuration setting to retry all connection issues, not just when the `_is_retryable_error` function determines ([#194](https://github.com/dbt-labs/dbt-spark/pull/194)) - -### Contributors -- [@JCZuurmond](https://github.com/JCZuurmond) ([#192](https://github.com/fishtown-analytics/dbt-spark/pull/192)) -- [@jethron](https://github.com/jethron) ([#201](https://github.com/fishtown-analytics/dbt-spark/pull/201)) -- [@gregingenii](https://github.com/gregingenii) ([#194](https://github.com/dbt-labs/dbt-spark/pull/194)) - -## dbt-spark 0.21.0b1 (August 3, 2021) - -## dbt-spark 0.20.1 (August 2, 2021) - -## dbt-spark 0.20.1rc1 (August 2, 2021) - -### Fixes -- Fix `get_columns_in_relation` when called on models created in the same run ([#196](https://github.com/dbt-labs/dbt-spark/pull/196), [#197](https://github.com/dbt-labs/dbt-spark/pull/197)) - -### Contributors -- [@ali-tny](https://github.com/ali-tny) ([#197](https://github.com/fishtown-analytics/dbt-spark/pull/197)) - - -## dbt-spark 0.20.0 (July 12, 2021) - -## dbt-spark 0.20.0rc2 (July 7, 2021) - -### Features - -- Add support for `merge_update_columns` config in `merge`-strategy incremental models ([#183](https://github.com/fishtown-analytics/dbt-spark/pull/183), [#184](https://github.com/fishtown-analytics/dbt-spark/pull/184)) - -### Fixes - -- Fix column-level `persist_docs` on Delta tables, add tests ([#180](https://github.com/fishtown-analytics/dbt-spark/pull/180)) - -## dbt-spark 0.20.0rc1 (June 8, 2021) - -### Features - -- Allow user to specify `use_ssl` ([#169](https://github.com/fishtown-analytics/dbt-spark/pull/169)) -- Allow setting table `OPTIONS` using `config` ([#171](https://github.com/fishtown-analytics/dbt-spark/pull/171)) -- Add support for column-level `persist_docs` on Delta tables ([#84](https://github.com/fishtown-analytics/dbt-spark/pull/84), [#170](https://github.com/fishtown-analytics/dbt-spark/pull/170)) - -### Fixes -- Cast `table_owner` to string to avoid errors generating docs ([#158](https://github.com/fishtown-analytics/dbt-spark/pull/158), [#159](https://github.com/fishtown-analytics/dbt-spark/pull/159)) -- Explicitly cast column types when inserting seeds ([#139](https://github.com/fishtown-analytics/dbt-spark/pull/139), [#166](https://github.com/fishtown-analytics/dbt-spark/pull/166)) - -### Under the hood -- Parse information returned by `list_relations_without_caching` macro to speed up catalog generation ([#93](https://github.com/fishtown-analytics/dbt-spark/issues/93), [#160](https://github.com/fishtown-analytics/dbt-spark/pull/160)) -- More flexible host passing, https:// can be omitted ([#153](https://github.com/fishtown-analytics/dbt-spark/issues/153)) - -### Contributors -- [@friendofasquid](https://github.com/friendofasquid) ([#159](https://github.com/fishtown-analytics/dbt-spark/pull/159)) -- [@franloza](https://github.com/franloza) ([#160](https://github.com/fishtown-analytics/dbt-spark/pull/160)) -- [@Fokko](https://github.com/Fokko) ([#165](https://github.com/fishtown-analytics/dbt-spark/pull/165)) -- [@rahulgoyal2987](https://github.com/rahulgoyal2987) ([#169](https://github.com/fishtown-analytics/dbt-spark/pull/169)) -- [@JCZuurmond](https://github.com/JCZuurmond) ([#171](https://github.com/fishtown-analytics/dbt-spark/pull/171)) -- [@cristianoperez](https://github.com/cristianoperez) ([#170](https://github.com/fishtown-analytics/dbt-spark/pull/170)) - - -## dbt-spark 0.19.1 (April 2, 2021) - -## dbt-spark 0.19.1b2 (February 26, 2021) - -### Under the hood -- Update serialization calls to use new API in dbt-core `0.19.1b2` ([#150](https://github.com/fishtown-analytics/dbt-spark/pull/150)) - -## dbt-spark 0.19.0.1 (February 26, 2021) - -### Fixes -- Fix package distribution to include incremental model materializations ([#151](https://github.com/fishtown-analytics/dbt-spark/pull/151), [#152](https://github.com/fishtown-analytics/dbt-spark/issues/152)) - -## dbt-spark 0.19.0 (February 21, 2021) - -### Breaking changes -- Incremental models have `incremental_strategy: append` by default. This strategy adds new records without updating or overwriting existing records. For that, use `merge` or `insert_overwrite` instead, depending on the file format, connection method, and attributes of your underlying data. dbt will try to raise a helpful error if you configure a strategy that is not supported for a given file format or connection. ([#140](https://github.com/fishtown-analytics/dbt-spark/pull/140), [#141](https://github.com/fishtown-analytics/dbt-spark/pull/141)) - -### Fixes -- Capture hard-deleted records in snapshot merge, when `invalidate_hard_deletes` config is set ([#109](https://github.com/fishtown-analytics/dbt-spark/pull/143), [#126](https://github.com/fishtown-analytics/dbt-spark/pull/144)) - -## dbt-spark 0.19.0rc1 (January 8, 2021) - -### Breaking changes -- Users of the `http` and `thrift` connection methods need to install extra requirements: `pip install dbt-spark[PyHive]` ([#109](https://github.com/fishtown-analytics/dbt-spark/pull/109), [#126](https://github.com/fishtown-analytics/dbt-spark/pull/126)) - -### Under the hood -- Enable `CREATE OR REPLACE` support when using Delta. Instead of dropping and recreating the table, it will keep the existing table, and add a new version as supported by Delta. This will ensure that the table stays available when running the pipeline, and you can track the history. -- Add changelog, issue templates ([#119](https://github.com/fishtown-analytics/dbt-spark/pull/119), [#120](https://github.com/fishtown-analytics/dbt-spark/pull/120)) - -### Fixes -- Handle case of 0 retries better for HTTP Spark Connections ([#132](https://github.com/fishtown-analytics/dbt-spark/pull/132)) - ### Contributors -- [@danielvdende](https://github.com/danielvdende) ([#132](https://github.com/fishtown-analytics/dbt-spark/pull/132)) -- [@Fokko](https://github.com/Fokko) ([#125](https://github.com/fishtown-analytics/dbt-spark/pull/125)) - -## dbt-spark 0.18.1.1 (November 13, 2020) - -### Fixes -- Fix `extras_require` typo to enable `pip install dbt-spark[ODBC]` (([#121](https://github.com/fishtown-analytics/dbt-spark/pull/121)), ([#122](https://github.com/fishtown-analytics/dbt-spark/pull/122))) - -## dbt-spark 0.18.1 (November 6, 2020) - -### Features -- Allows users to specify `auth` and `kerberos_service_name` ([#107](https://github.com/fishtown-analytics/dbt-spark/pull/107)) -- Add support for ODBC driver connections to Databricks clusters and endpoints ([#116](https://github.com/fishtown-analytics/dbt-spark/pull/116)) - -### Under the hood -- Updated README links ([#115](https://github.com/fishtown-analytics/dbt-spark/pull/115)) -- Support complete atomic overwrite of non-partitioned incremental models ([#117](https://github.com/fishtown-analytics/dbt-spark/pull/117)) -- Update to support dbt-core 0.18.1 ([#110](https://github.com/fishtown-analytics/dbt-spark/pull/110), [#118](https://github.com/fishtown-analytics/dbt-spark/pull/118)) - -### Contributors -- [@danielhstahl](https://github.com/danielhstahl) ([#107](https://github.com/fishtown-analytics/dbt-spark/pull/107)) -- [@collinprather](https://github.com/collinprather) ([#115](https://github.com/fishtown-analytics/dbt-spark/pull/115)) -- [@charlottevdscheun](https://github.com/charlottevdscheun) ([#117](https://github.com/fishtown-analytics/dbt-spark/pull/117)) -- [@Fokko](https://github.com/Fokko) ([#117](https://github.com/fishtown-analytics/dbt-spark/pull/117)) +- [@grindheim](https://github.com/grindheim) ([#262](https://github.com/dbt-labs/dbt-spark/pull/262/)) +- [@vingov](https://github.com/vingov) ([#210](https://github.com/dbt-labs/dbt-spark/pull/210)) -## dbt-spark 0.18.0 (September 18, 2020) +## Previous Releases +For information on prior releases of dbt-spark prior to 1.0.0 please see +- [0.21](https://github.com/dbt-labs/dbt-spark/blob/0.21.latest/CHANGELOG.md) +- [0.20](https://github.com/dbt-labs/dbt-spark/blob/0.20.latest/CHANGELOG.md) +- [0.19 and earlier](https://github.com/dbt-labs/dbt-spark/blob/0.19.latest/CHANGELOG.md) -### Under the hood -- Make a number of changes to support dbt-adapter-tests ([#103](https://github.com/fishtown-analytics/dbt-spark/pull/103)) -- Update to support dbt-core 0.18.0. Run CI tests against local Spark, Databricks ([#105](https://github.com/fishtown-analytics/dbt-spark/pull/105)) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..1d6e76d31 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,111 @@ +# Contributing to `dbt-spark` + +1. [About this document](#about-this-document) +3. [Getting the code](#getting-the-code) +5. [Running `dbt-spark` in development](#running-dbt-spark-in-development) +6. [Testing](#testing) +7. [Updating Docs](#updating-docs) +7. [Submitting a Pull Request](#submitting-a-pull-request) + +## About this document +This document is a guide intended for folks interested in contributing to `dbt-spark`. Below, we document the process by which members of the community should create issues and submit pull requests (PRs) in this repository. It is not intended as a guide for using `dbt-spark`, and it assumes a certain level of familiarity with Python concepts such as virtualenvs, `pip`, Python modules, and so on. This guide assumes you are using macOS or Linux and are comfortable with the command line. + +For those wishing to contribute we highly suggest reading the dbt-core's [contribution guide](https://github.com/dbt-labs/dbt-core/blob/HEAD/CONTRIBUTING.md) if you haven't already. Almost all of the information there is applicable to contributing here, too! + +### Signing the CLA + +Please note that all contributors to `dbt-spark` must sign the [Contributor License Agreement](https://docs.getdbt.com/docs/contributor-license-agreements) to have their Pull Request merged into an `dbt-spark` codebase. If you are unable to sign the CLA, then the `dbt-spark` maintainers will unfortunately be unable to merge your Pull Request. You are, however, welcome to open issues and comment on existing ones. + + +## Getting the code + +You will need `git` in order to download and modify the `dbt-spark` source code. You can find directions [here](https://github.com/git-guides/install-git) on how to install `git`. + +### External contributors + +If you are not a member of the `dbt-labs` GitHub organization, you can contribute to `dbt-spark` by forking the `dbt-spark` repository. For a detailed overview on forking, check out the [GitHub docs on forking](https://help.github.com/en/articles/fork-a-repo). In short, you will need to: + +1. fork the `dbt-spark` repository +2. clone your fork locally +3. check out a new branch for your proposed changes +4. push changes to your fork +5. open a pull request against `dbt-labs/dbt-spark` from your forked repository + +### dbt Labs contributors + +If you are a member of the `dbt Labs` GitHub organization, you will have push access to the `dbt-spark` repo. Rather than forking `dbt-spark` to make your changes, just clone the repository, check out a new branch, and push directly to that branch. + + +## Running `dbt-spark` in development + +### Installation + +First make sure that you set up your `virtualenv` as described in [Setting up an environment](https://github.com/dbt-labs/dbt-core/blob/HEAD/CONTRIBUTING.md#setting-up-an-environment). Ensure you have the latest version of pip installed with `pip install --upgrade pip`. Next, install `dbt-spark` latest dependencies: + +```sh +pip install -e . -r dev-requirements.txt +``` + +When `dbt-spark` is installed this way, any changes you make to the `dbt-spark` source code will be reflected immediately in your next `dbt-spark` run. + +To confirm you have correct version of `dbt-core` installed please run `dbt --version` and `which dbt`. + + +## Testing + +### Initial Setup + +`dbt-spark` uses test credentials specified in a `test.env` file in the root of the repository. This `test.env` file is git-ignored, but please be _extra_ careful to never check in credentials or other sensitive information when developing. To create your `test.env` file, copy the provided example file, then supply your relevant credentials. + +``` +cp test.env.example test.env +$EDITOR test.env +``` + +### Test commands +There are a few methods for running tests locally. + +#### `tox` +`tox` takes care of managing Python virtualenvs and installing dependencies in order to run tests. You can also run tests in parallel, for example you can run unit tests for Python 3.7, Python 3.8, Python 3.9, and `flake8` checks in parallel with `tox -p`. Also, you can run unit tests for specific python versions with `tox -e py37`. The configuration of these tests are located in `tox.ini`. + +#### `pytest` +Finally, you can also run a specific test or group of tests using `pytest` directly. With a Python virtualenv active and dev dependencies installed you can do things like: + +```sh +# run specific spark integration tests +python -m pytest -m profile_spark tests/integration/get_columns_in_relation +# run specific functional tests +python -m pytest --profile databricks_sql_endpoint tests/functional/adapter/test_basic.py +# run all unit tests in a file +python -m pytest tests/unit/test_adapter.py +# run a specific unit test +python -m pytest test/unit/test_adapter.py::TestSparkAdapter::test_profile_with_database +``` +## Updating Docs + +Many changes will require and update to the `dbt-spark` docs here are some useful resources. + +- Docs are [here](https://docs.getdbt.com/). +- The docs repo for making changes is located [here]( https://github.com/dbt-labs/docs.getdbt.com). +- The changes made are likely to impact one or both of [Spark Profile](https://docs.getdbt.com/reference/warehouse-profiles/spark-profile), or [Saprk Configs](https://docs.getdbt.com/reference/resource-configs/spark-configs). +- We ask every community member who makes a user-facing change to open an issue or PR regarding doc changes. + +## Adding CHANGELOG Entry + +We use [changie](https://changie.dev) to generate `CHANGELOG` entries. **Note:** Do not edit the `CHANGELOG.md` directly. Your modifications will be lost. + +Follow the steps to [install `changie`](https://changie.dev/guide/installation/) for your system. + +Once changie is installed and your PR is created, simply run `changie new` and changie will walk you through the process of creating a changelog entry. Commit the file that's created and your changelog entry is complete! + +You don't need to worry about which `dbt-spark` version your change will go into. Just create the changelog entry with `changie`, and open your PR against the `main` branch. All merged changes will be included in the next minor version of `dbt-spark`. The Core maintainers _may_ choose to "backport" specific changes in order to patch older minor versions. In that case, a maintainer will take care of that backport after merging your PR, before releasing the new version of `dbt-spark`. + +## Submitting a Pull Request + +dbt Labs provides a CI environment to test changes to the `dbt-spark` adapter, and periodic checks against the development version of `dbt-core` through Github Actions. + +A `dbt-spark` maintainer will review your PR. They may suggest code revision for style or clarity, or request that you add unit or integration test(s). These are good things! We believe that, with a little bit of help, anyone can contribute high-quality code. + +Once all requests and answers have been answered the `dbt-spark` maintainer can trigger CI testing. + +Once all tests are passing and your PR has been approved, a `dbt-spark` maintainer will merge your changes into the active development branch. And that's it! Happy developing :tada: