Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize externalReferences URLs and exclude invalid ones + Strict JSON Schema validation #1130

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

marob
Copy link
Contributor

@marob marob commented May 30, 2024

The goal of this PR is to fix issue #1107

It should also replace #1128

marob and others added 2 commits May 30, 2024 17:14
Signed-off-by: Maxime Robert <maxime.robert@smile.fr>
Signed-off-by: Tim Messing <141575989+timmyteo@users.noreply.github.com>
Signed-off-by: Maxime Robert <maxime.robert@smile.fr>
@prabhu
Copy link
Collaborator

prabhu commented May 30, 2024

Looks very neat! Thank you so much!

@timmyteo @setchy Could you test this locally with some repos and share your thoughts?

* @returns {Boolean} Flag indicating whether the supplied URL is valid or not
*
*/
export function isValidIriReference(url) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this can be replaced with a simple url parse and protocol checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timmyteo was doing a try/catch on new URL(), but it was allowing urls that are rejected by DependencyTrack. According to my tests, this is the best option to be compatible with DT (I also tried to validate using https://github.com/luzlab/ajv-formats-draft2019/blob/master/formats/iri-reference.js, but it was too permissive too).
Anyway, it needs more testing.

@setchy
Copy link
Member

setchy commented May 30, 2024

Great work @marob

Tested this branch against some of my repos. BOM looks good. Uploaded to DTrack 4.11.1 with BOM Validation enabled without issues

One observation: we may need to dedupe externalReferences after sanitization

for example

"externalReferences": [
    {
        "type": "vcs",
        "url": "https://github.com/follow-redirects/follow-redirects"
    },
    {
        "type": "vcs",
        "url": "git@github.com:follow-redirects/follow-redirects.git"
    }
],

became

"externalReferences": [
    {
        "type": "vcs",
        "url": "https://github.com/follow-redirects/follow-redirects.git"
    },
    {
        "type": "vcs",
        "url": "https://github.com/follow-redirects/follow-redirects.git"
    }
],

@prabhu
Copy link
Collaborator

prabhu commented May 31, 2024

Any ideas about the test failures? We may have to run it locally to troubleshoot, since the validation errors are not shown.

@marob
Copy link
Contributor Author

marob commented May 31, 2024

Any ideas about the test failures? We may have to run it locally to troubleshoot, since the validation errors are not shown.

I have no idea what those tests are doing. On all the ones that fail, only this one is showing some details on the reason why: https://github.com/CycloneDX/cdxgen/actions/runs/9306982345/job/25617330208?pr=1130
It looks like it's linked to strict JSON Schema validation on iri-reference.

@prabhu
Copy link
Collaborator

prabhu commented May 31, 2024

'scm:svn:https://svn.codehaus.org/jettison/tags/jettison-1.3.7' - could try my idea of using the url parse instead of regex? Or check if the value starts with http?

@prabhu
Copy link
Collaborator

prabhu commented Jun 1, 2024

Can I come up with another branch that doesn't use additional dependencies? If we have more test cases, that will help.

@@ -71,6 +71,7 @@
"@npmcli/arborist": "7.5.2",
"ajv": "^8.14.0",
"ajv-formats": "^3.0.1",
"ajv-formats-draft2019": "^1.6.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having better success with validate-iri package.

import { validateIri, IriValidationStrategy } from 'validate-iri'
let yourIri = 'git@gitlab.com:behat-chrome/chrome-mink-driver.git'
console.log(validateIri(yourIri, IriValidationStrategy.Strict));

yourIri = '${repository.url}';
console.log(validateIri(yourIri, IriValidationStrategy.Strict));

yourIri = 'git+https://github.com/Alex-D/check-disk-space.git';
console.log(validateIri(yourIri, IriValidationStrategy.Strict));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for some validation library but didn't find this one 👍

@@ -90,7 +91,8 @@
"tar": "^6.2.1",
"uuid": "^9.0.1",
"xml-js": "^1.6.11",
"yargs": "^17.7.2"
"yargs": "^17.7.2",
"hosted-git-info": "^7.0.2"
Copy link
Collaborator

@prabhu prabhu Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce this package and change the url format in this PR. We can drop any non-compliant urls, maybe with a log in DEBUG_MODE.

@@ -35,7 +36,8 @@ export const validateBom = (bomJson) => {
);
const ajv = new Ajv({
schemas: [schema, defsSchema, spdxSchema],
strict: false,
strict: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep strict: false, since there could be many other data-quality issues.

@@ -45,6 +47,7 @@ export const validateBom = (bomJson) => {
},
});
addFormats(ajv);
addFormatsDraft2019(ajv);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required when we use validate-iri.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required when strict is true

@prabhu
Copy link
Collaborator

prabhu commented Jun 3, 2024

@marob Do you have time to look at the pending comments?

@marob
Copy link
Contributor Author

marob commented Jun 3, 2024

@marob Do you have time to look at the pending comments?

@prabhu Probably not this week.

@prabhu
Copy link
Collaborator

prabhu commented Jun 3, 2024

Blocked by: #1134

@prabhu
Copy link
Collaborator

prabhu commented Jun 6, 2024

We now have an implementation with validation alone that is merged in master. It currently logs the problematic urls and the then filters them. Let's work on the sanitize feature in a separate branch.

@prabhu
Copy link
Collaborator

prabhu commented Jun 11, 2024

@marob can you create a new branch to implement the sanitize feature for urls?

@marob
Copy link
Contributor Author

marob commented Sep 2, 2024

@prabhu Should we create a branch to sanitize externalReference URLs? Do we need that feature?

@prabhu
Copy link
Collaborator

prabhu commented Sep 2, 2024

@marob definitely feel free to work on a new branch, although this is a lower priority atm due to lack of any requests from the community.

@setchy setchy removed their request for review September 4, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants