Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create extra job for extracting and validating all needed info from PR #296

Open
mgoerens opened this issue Nov 10, 2023 · 2 comments · May be fixed by #392
Open

Create extra job for extracting and validating all needed info from PR #296

mgoerens opened this issue Nov 10, 2023 · 2 comments · May be fixed by #392
Assignees

Comments

@mgoerens
Copy link
Contributor

At several places in the build.yaml workflow, we extract and validate information from the PR, its files, etc. Typically we need the category, organization, chart_name, chart_version at different times.

There are several occurrences of such redundant calls (e.g. to get_modified_files) at different stages of the workflow.

We should create a separate job that takes care of:

  1. Extracting all information that is needed by later steps from the PR content.
  2. Validate those info (e.g. valid semver version, match between directory structure and info in report.yaml, etc.)
  3. Present those info to the other steps

For 3. instead of GitHub outputs, we could populate a file and pass it to the other jobs as a PR artifact (see https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts#passing-data-between-jobs-in-a-workflow)

Such a file could look for example like this:

pre-check:
    content:
        modified_files:
            - charts/partners/hashicorp/vault/1.13.0/report.yaml
            - ...
        category: partners
        organization: hashicorp # vendor
        version: 1.13.0
        chart_name: vault
        source:
            provided: False
        report:
            provided: True
            signed: False
        owners_file_modified: False
        web_catalog_only: True
    pr_infos:
        api_url:
        PR_number: 30145
@komish
Copy link
Contributor

komish commented Nov 14, 2023

@mgoerens 100% agree. Looking at the certified chart dependency project, I may need to implement a small portion of this to accomplish a few things required in that pipeline. I may continue to use GITHUB_OUTPUT for those cases, but we can transition that down to some kind of format that looks like the above down the line.

@mgoerens
Copy link
Contributor Author

See #314 (comment) for future work item

@mgoerens mgoerens self-assigned this Feb 17, 2024
mgoerens added a commit to mgoerens/development that referenced this issue Mar 7, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicated of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

Eventually, this data structure is meant to encapsulate more
information concerning the user's submission, and be dumped to a file to
be passed around the certification pipeline jobs. This early (and
incomplete) version is intended to trigger discussion around the design
choices and general direction taken to solve openshift-helm-charts#296.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 7, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicated of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

Eventually, this data structure is meant to encapsulate more
information concerning the user's submission, and be dumped to a file to
be passed around the certification pipeline jobs. This early (and
incomplete) version is intended to trigger discussion around the design
choices and general direction taken to solve openshift-helm-charts#296.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 7, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

Eventually, this data structure is meant to encapsulate more
information concerning the user's submission, and be dumped to a file to
be passed around the certification pipeline jobs. This early (and
incomplete) version is intended to trigger discussion around the design
choices and general direction taken to solve openshift-helm-charts#296.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 15, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

This commit also implements a custom JSON Serializer / Deserializer. The
Submission object can therefore easily be dumped to / read from a file.
Such file can be an GitHub workflow artifact, populated in the
"pre-check" job and read in subsequent jobs.

Eventually, this data structure is meant to encapsulate more information
concerning the user's submission. This early (and incomplete) version is
intended to trigger discussion around the design choices and general
direction taken to solve openshift-helm-charts#296.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 15, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

This commit also implements a custom JSON Serializer / Deserializer. The
Submission object can therefore easily be dumped to / read from a file.
Such file can be an GitHub workflow artifact, populated in the
"pre-check" job and read in subsequent jobs.

Eventually, this data structure is meant to encapsulate more information
concerning the user's submission. This early (and incomplete) version is
intended to trigger discussion around the design choices and general
direction taken to solve openshift-helm-charts#296.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 20, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

This commit also implements a custom JSON Serializer / Deserializer. The
Submission object can therefore easily be dumped to / read from a file.
Such file can be an GitHub workflow artifact, populated in the
"pre-check" job and read in subsequent jobs.

Eventually, this data structure is meant to encapsulate more information
concerning the user's submission. This early (and incomplete) version is
intended to trigger discussion around the design choices and general
direction taken to solve openshift-helm-charts#296.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 21, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

This commit also implements a custom JSON Serializer / Deserializer. The
Submission object can therefore easily be dumped to / read from a file.
Such file can be an GitHub workflow artifact, populated in the
"pre-check" job and read in subsequent jobs.

Eventually, this data structure is meant to encapsulate more information
concerning the user's submission. This early (and incomplete) version is
intended to trigger discussion around the design choices and general
direction taken to solve openshift-helm-charts#296.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 22, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

This commit also implements a custom JSON Serializer / Deserializer. The
Submission object can therefore easily be dumped to / read from a file.
Such file can later be an GitHub workflow artifact, populated in a
future "pre-check" job and read in subsequent jobs.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Mar 22, 2024
This is the first step towards solving openshift-helm-charts#296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

This commit also implements a custom JSON Serializer / Deserializer. The
Submission object can therefore easily be dumped to / read from a file.
Such file can later be an GitHub workflow artifact, populated in a
future "pre-check" job and read in subsequent jobs.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit that referenced this issue Apr 10, 2024
This is the first step towards solving #296. The code added by this
commit is currently *not* put in use in the certification pipeline and
is a partial duplicate of the "checkprcontent" package.

The Submission data structure can collect and run early validation
checks on all information concerning a user's submission (i.e. a GitHub
Pull Request). In its current state, given an api_url (link to a given
PR), it can collect the list of modified files, and extract the
category, organization, chart's name and version, as well as running
basic checks, such as SemVer compatibility of the provided version.

This commit also implements a custom JSON Serializer / Deserializer. The
Submission object can therefore easily be dumped to / read from a file.
Such file can later be a GitHub workflow artifact, populated in a
future "pre-check" job and read in subsequent jobs.

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Oct 1, 2024
This PR refactors the certification pipeline by splitting the
chart-verifier job into three jobs, running sequentially:
- The new 'validate-submission' is responsible for extracting and
  validating all information related to a PR.
- The existing 'chart-verifier' job is now only responsible for running
  chart-verifier against the Chart source (if provided, or verifies the
  submitted report.yaml instead).
- The new 'manage-gh-pr' adds a comment on the PR, and merges it in case
  the pipeline was successfull.

This PR also introduces a new mechanism for passing information between
jobs. In addition to the existing GitHub outputs, a submission.json
artifact containing a json representation of the Submission object is
created in 'validate-submission'. Subsequent steps/jobs now have the
possibility to read from this file.

The main pursued goal behind this PR is to better separate concerns and
avoid mixed up logic, thus improving readability and clarity of our
pipeline and improve future maintainability and onboarding.

This PR also opens up the ability to reducde code deduplication and
redundant checks. In the interest of keeping this PR to a minimum, this
has not been fully done here. One example: get-verify-params.py
currently queries the GitHub API in order to get chart's information,
while they are already available in the submission.json artifact. This
rather trivial change is left for a future PR.

Note that the 'submission' python module is essentially replacing the
old 'checkpr' module. The only function remaining used by other modules
being 'get_file_match_compiled_patterns', it is moved to the 'reporegex'
module.

Closes openshift-helm-charts#296

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Oct 1, 2024
This PR refactors the certification pipeline by splitting the
chart-verifier job into three jobs, running sequentially:
- The new 'validate-submission' is responsible for extracting and
  validating all information related to a PR.
- The existing 'chart-verifier' job is now only responsible for running
  chart-verifier against the Chart source (if provided, or verifies the
  submitted report.yaml instead).
- The new 'manage-gh-pr' adds a comment on the PR, and merges it in case
  the pipeline was successfull.

This PR also introduces a new mechanism for passing information between
jobs. In addition to the existing GitHub outputs, a submission.json
artifact containing a json representation of the Submission object is
created in 'validate-submission'. Subsequent steps/jobs now have the
possibility to read from this file.

The main pursued goal behind this PR is to better separate concerns and
avoid mixed up logic, thus improving readability and clarity of our
pipeline and improve future maintainability and onboarding.

This PR also opens up the ability to reducde code deduplication and
redundant checks. In the interest of keeping this PR to a minimum, this
has not been fully done here. One example: get-verify-params.py
currently queries the GitHub API in order to get chart's information,
while they are already available in the submission.json artifact. This
rather trivial change is left for a future PR.

Note that the 'submission' python module is essentially replacing the
old 'checkpr' module. The only function remaining used by other modules
being 'get_file_match_compiled_patterns', it is moved to the 'reporegex'
module.

Closes openshift-helm-charts#296

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
mgoerens added a commit to mgoerens/development that referenced this issue Oct 1, 2024
This PR refactors the certification pipeline by splitting the
chart-verifier job into three jobs, running sequentially:
- The new 'validate-submission' is responsible for extracting and
  validating all information related to a PR.
- The existing 'chart-verifier' job is now only responsible for running
  chart-verifier against the Chart source (if provided, or verifies the
  submitted report.yaml instead).
- The new 'manage-gh-pr' adds a comment on the PR, and merges it in case
  the pipeline was successfull.

This PR also introduces a new mechanism for passing information between
jobs. In addition to the existing GitHub outputs, a submission.json
artifact containing a json representation of the Submission object is
created in 'validate-submission'. Subsequent steps/jobs now have the
possibility to read from this file.

The main pursued goal behind this PR is to better separate concerns and
avoid mixed up logic, thus improving readability and clarity of our
pipeline and improve future maintainability and onboarding.

This PR also opens up the ability to reducde code deduplication and
redundant checks. In the interest of keeping this PR to a minimum, this
has not been fully done here. One example: get-verify-params.py
currently queries the GitHub API in order to get chart's information,
while they are already available in the submission.json artifact. This
rather trivial change is left for a future PR.

Note that the 'submission' python module is essentially replacing the
old 'checkpr' module. The only function remaining used by other modules
being 'get_file_match_compiled_patterns', it is moved to the 'reporegex'
module.

Closes openshift-helm-charts#296

Signed-off-by: Matthias Goerens <mgoerens@redhat.com>
@mgoerens mgoerens linked a pull request Oct 1, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants