Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include list of installed Debian packages/versions in .sandstorm #249

Open
ocdtrekkie opened this issue Jan 28, 2020 · 13 comments
Open

Include list of installed Debian packages/versions in .sandstorm #249

ocdtrekkie opened this issue Jan 28, 2020 · 13 comments

Comments

@ocdtrekkie
Copy link
Collaborator

22:31 Maybe we could add a feature that dumps a list of debian packages that are installed into the spk, so you can inspect it after the fact to figure out what changed.

For transparency and so that we can diagnose issues while more aggressively allowing updates inside vagrant-spk VMs, we should dump the list of packages and their versions out to a file in .sandstorm that ends up committed to GitHub.

An analogue to this is the "stack" file, which notes what stack was used to build a package, despite being largely vestigial after the stack is set up, but useful for reference.

@ocdtrekkie ocdtrekkie added this to the vagrant-spk 1.1 milestone Jan 28, 2020
@ocdtrekkie
Copy link
Collaborator Author

We think there's probably a good command that will produce this output, but don't know off hand what it is.

@paulproteus
Copy link
Collaborator

You might find these useful:

dpkg -l -- lists all packages, including versions iirc

dpkg -L {filename} -- asks dpkg what package provided some file that is installed

Happy to say more if needed. Cheers!

@ocdtrekkie
Copy link
Collaborator Author

Thanks Asheesh!

I will do some testing. Assuming the output is what we need, I'll need to decide when to generate this file. If I can get it on the closure of the vagrant-spk dev, so it updates similarly to when sandstorm-files.list is updated, there should be a good strong relationship between the list of files that are included and the list and versions of Debian packages that were used when those files were selected.

@zenhack
Copy link
Collaborator

zenhack commented Jan 28, 2020 via email

@ocdtrekkie
Copy link
Collaborator Author

@zenhack That is probably a trivial adjustment, though it would be a departure from the general behavior of only including what the SPK needs to work in the package itself.

I am also not positive the best way to do this.

  • Adding a metadata/manifest entry similar to changelog/description/etc. would make it part of Sandstorm's overall definition, not vagrant-spk's, which I don't think is a good idea since it may not be relevant or desired in other packages.
  • We could add it to the alwaysInclude in the pkgdef, but vagrant-spk generally doesn't retroactively tamper with that, and I wonder if having vagrant-spk include a nonfunctional file in that by default on vagrant-spk init is weird. But without this, presumably spk will ignore the file and not package it.

As of right now, all public Sandstorm packages are open source, and generally we do verify that the repository has been updated for the latest release during the app review process. While I imagine security folks may wish to unpack an SPK to analyze or study it, and verify that it matches what is in the source repo, I am unsure how likely people are to look at unpacking an SPK for the package info versus looking at the list in the source repo.

Perhaps I am curious how you'd see this file in the SPK being utilized. Are there programmatic uses for this file you'd imagine?

@zenhack
Copy link
Collaborator

zenhack commented Jan 28, 2020

Yeah, I'm thinking in terms of automatic scanning tools. We could do some of that against the repos too, to some extend. But this is somewhat still in the brainstorming phase. I think we should also put the list in the source repo, so maybe start with that and go from there.

@ocdtrekkie
Copy link
Collaborator Author

@zenhack Do we want to trust that the list is accurate for the purposes of security scanning? If say, like above, we updated it on closing out spk dev, it could be user-modified prior to spk pack. So presumably then the right way to do it for that purpose would be to generate it as part of spk pack instead. (Though I'd still want to do it on dev in case someone didn't commit their GitHub after running pack.)

Would it be possible to determine from the files themselves what version they are? The package list of what was in the VM might contain packages not actually included in the published SPK, leading to false positives. Presumably we just want to know if there's old/insecure binaries that actually make it into sandstorm-files.list (+ the alwaysInclude folders).

And yeah, adding it to .sandstorm so it ends up in GitHub is trivial and costs us nothing, and if we also determine we want it included in the SPK, presumably we are going to still be pulling it from/storing it in .sandstorm.

@zenhack
Copy link
Collaborator

zenhack commented Jan 28, 2020

We certainly can't trust it's accurate in the face of a developer actively messing with it. We could query the package manager to work out which package each file we actually include belongs to. But we don't necessarily want to exclude a package just because its files didn't land in the spk; that could happen because an executable is statically linked or such (not standard for distro packages, but for some languages (e.g. go) the tooling kindof imposes it), or otherwise have had some influence on something that is included. Hopefully there won't be a huge number of things that get pulled in as a dependency but aren't actually relevant to the final package at all.

@ocdtrekkie
Copy link
Collaborator Author

Perhaps the flow there would be to use this file as a quick filter for vulnerable packages, but then follow it up by directly evaluating what is in sandstorm-files.list on whether or not the vulnerable package is included.

Right now, this solution is vagrant-spk specific, but we may want to brainstorm the matter of a SPK vulnerability scanner ideally working regardless of dev tool. For docker-spk, I imagine a similar solution could be implemented, but I would be extremely wary of including this list for an spk built package, as we cannot assume the developer isn't using their personal machine for dev and that the package list might contain a lot of information leakage about their machine.

@ocdtrekkie
Copy link
Collaborator Author

dpkg -l output on a nearly fresh PineBook Pro is 224 kB. And has descriptions for each and every package. So we should drop a few columns of information from it to try to get it down to size.

@zenhack
Copy link
Collaborator

zenhack commented Feb 3, 2020 via email

@ocdtrekkie
Copy link
Collaborator Author

ocdtrekkie commented Feb 4, 2020

The command I have now is dpkg -l | tail -n +6 | awk '{print $2, $3}' > pkglist and that gives me a nice clean output like:

apt 1.4.9
apt-transport-https 1.4.9
apt-utils 1.4.9
etc.

I still haven't tested it inside the Vagrant box yet. (Such is the downsides of playing with an ARM laptop.) But my 224 kB file went down to 43 kB solely by omitting information I don't want. :)

@ocdtrekkie
Copy link
Collaborator Author

I feel like since this doesn't have a lot of impact on the package itself we're producing, and it just needs to output the command in the right folder, this is probably a relatively simple project for someone familiar with Python who tests out the vagrant-spk packaging flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants