Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1511 Incomplete SBOM generated from Docker image #1512

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

robross0606
Copy link

@robross0606 robross0606 commented Dec 23, 2024

@robross0606 robross0606 requested a review from prabhu as a code owner December 23, 2024 16:44
@robross0606 robross0606 force-pushed the fix/remove-strict-tar-extraction branch 2 times, most recently from 609a835 to 0eac060 Compare December 23, 2024 17:06
@@ -804,36 +807,17 @@ export const extractTar = async (fullImageName, dir) => {
"Please run cdxgen from a powershell terminal with admin privileges to create symlinks.",
);
console.log(err);
} else if (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we need to remove these blocks?

Copy link
Author

@robross0606 robross0606 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they seemed to be mostly pointless. The logic of the old if/then/else doesn't wastes performance for several reasons:

  • Once we turn off strict mode, several of these will not be thrown as errors as per documentation. This includes anything that starts with TAR_ENTRY_.
  • The current if/then/else re-tests the same err.code multiple times. It tests TAR_BAD_ARCHIVE, TAR_ENTRY_INFO and TAR_ENTRY_INVALID multiple times for no reason. Flipping the order of the if/then/else has the same end result with better performance.
  • Using array.includes is non-performant and gains little in terms of end result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, here is a summary of how all those tests should shake out now:

Code Old New
TAR_BAD_ARCHIVE Prints "Archive ${fullImageName} is empty. Skipping." and ends extraction. Prints "Archive ${fullImageName} is empty. Skipping." and ends extraction.
TAR_ENTRY_INFO Black holed without any message and ends extraction. Prints warning only if DEBUG_MODE is true and continues extraction.
TAR_ENTRY_INVALID Black holed without any message and ends extraction. Prints warning and continues extraction.
TAR_ENTRY_ERROR Prints err and ends extraction. Prints warning and continues extraction.
TAR_ENTRY_UNSUPPORTED Prints err and ends extraction. Prints warning and continues extraction.
TAR_ABORT Prints err and ends extraction. Prints "Error while extracting image..." and ends extraction.
EACCES Prints err and ends extraction. Prints err and ends extraction.
EPERM Prints "Please run cdxgen from a powershell terminal..." and ends extraction. Prints "Please run cdxgen from a powershell terminal..." and ends extraction.
Anything unexpected Prints "Error while extracting image..." and ends extraction. Prints "Error while extracting image..." and ends extraction.

// ignore
if (code !== "TAR_ENTRY_INFO" || DEBUG_MODE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will print a lot of messages in debug mode including the thing that says removing /?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say "a lot" but, yes, it will. IMO, that is desirable for DEBUG_MODE. The tar command in linux does the same thing.

Copy link
Author

@robross0606 robross0606 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with your approach is that it relies on the error message matching your expectations. That creates a regression problem you're not prepared to address without adding more detailed unit testing. Your solution also didn't look at the code which is far more likely to remain unchanged with subsequent releases.

Copy link
Collaborator

@prabhu prabhu Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make this if (DEBUG_MODE) ? We are getting more output like this

Copy link
Author

@robross0606 robross0606 Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're suggesting that issues of types like TAR_ENTRY_INVALID should be simply ignored without any printed message unless the user enables DEBUG_MODE. Skipping TAR_ENTRY_INFO is fine. Skipping the other ones is NOT fine. I can tell you that, if you did this, we would opt to not use cdxgen at all and go with syft or something else. Simply black-holing warnings that are causing entries to not be expanded is a BIG DEAL. That means that cdxgen is not scanning whatever wasn't expanded, and the user has no idea this happened. This means that, technically, the current version of cdxgen already potentially has a security issue if people are relying on this SBOM for vulnerability assessment.

Based on your other responses on this MR, I'm not sure you're realizing the potential scope of the problem or that you've yet conceded the current version of cdxgen is scarily broken with respect to scanning docker images. The user currently has no idea that things were not expanded and there is no determinate pattern to what can be skipped. The only reason we know about it now is because the expansion happened to not expand the package.json file. But what if it failed to expand other files (OS, components, etc.) that the scanner then skips? The user gets no errors, no warning messages and, most importantly, inaccurate SBOM output.

If you are getting all those issues, then the TAR may have problems. What I would suggest is that we include the code in that warning message so we can at least tell what type of warning it is. Then we can circle back on next steps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing strict: false is actually a far bigger problem than making some warnings only show up in DEBUG_MODE. I have to balance the amount of messages cdxgen throws up in default mode, since our users would happily create an issue for each and every message, without the slightest hesitation.

It is work-in-progress, alright. There is only so much a small unfunded team could do. You can always feel free to go with another tool or rollup the sleeves and help us with dev and testing.

@robross0606 robross0606 force-pushed the fix/remove-strict-tar-extraction branch from 0eac060 to 821cc52 Compare December 23, 2024 21:00
@prabhu
Copy link
Collaborator

prabhu commented Dec 23, 2024

Thank you so much. Do you have any ideas for adding any new test cases or this is good to go?

@prabhu
Copy link
Collaborator

prabhu commented Dec 24, 2024

Let me do some more manual tests as well. Meanwhile, can you make strict: false in the other places as well as shown in my PR? Also run pnpm run lint.

@@ -754,11 +754,14 @@ export const extractTar = async (fullImageName, dir) => {
preserveOwner: false,
noMtime: true,
noChmod: true,
strict: true,
strict: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make strict: false in other places as well.

@prabhu
Copy link
Collaborator

prabhu commented Dec 24, 2024

I also have another question. Other than the image you shared, are there any other images that are not working without strict: false. If the sample size happens to be just one or a small value, perhaps we make it an optional flag instead?

I am unsure of the security and performance implications of this change.

@prabhu
Copy link
Collaborator

prabhu commented Dec 24, 2024

@robross0606 Can you test the new PR #1513 ?

@robross0606
Copy link
Author

robross0606 commented Dec 24, 2024

I also have another question. Other than the image you shared, are there any other images that are not working without strict: false. If the sample size happens to be just one or a small value, perhaps we make it an optional flag instead?

I am unsure of the security and performance implications of this change.

The sample size is not small at all. In fact, it appears to be happening on almost every expanded TAR from a docker image. Literally any TAR_ENTRY_INFO warning would currently be turned into an error and kick out the expansion at random times with no logged reason. Any time TAR has to remove the forward slash from an entry, it would kick out with a TAR_ENTRY_INFO. That would be just about every docker-based TAR. In my spot checking, there were random things skipped from almost every TAR file. We just got lucky here that the skipped file was the package.json which caused the scanner to fail. Most times it skips some other files which then quietly puts the veracity of the SBOM into question.

…warnings do not prematurely end expansion.

Signed-off-by: Robert Ross <robross0606@gmail.com>
@robross0606 robross0606 force-pushed the fix/remove-strict-tar-extraction branch from 821cc52 to c971cbd Compare December 24, 2024 15:31
@prabhu
Copy link
Collaborator

prabhu commented Dec 24, 2024

Ok, please test my new PR. I have made it to work in strict mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants