-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added debian parser #3543
base: main
Are you sure you want to change the base?
Conversation
Only review and Merging left @terriko @anthonyharrison |
@crazytrain328 Can you please add some tests and include some sample data to demonstrate the parser working. |
@anthonyharrison Could you tell me how to do that? I have worked on adding fuzz testing to parsers before. Do i do the same here? |
Signed-off-by: Joydeep Tripathy <bntripathy123@gmail.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3543 +/- ##
==========================================
+ Coverage 75.41% 80.35% +4.93%
==========================================
Files 808 823 +15
Lines 11983 12799 +816
Branches 1598 1999 +401
==========================================
+ Hits 9037 10284 +1247
+ Misses 2593 2089 -504
- Partials 353 426 +73
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking promising!
We do probably need a test. I'd suggest you craft a slightly empty .deb file that has the appropriate product data but no actual product files. Put it in the test/assets
directory (alongside the other test.deb and .cab and .apk files -- you might be able to re-use the existing test.deb but I don't know if the data you need was stripped or not). Then build a test that calls the functions you wrote to parse the test file and verify that it's finding the correct data.
You'll also want to add documentation explaining how it works and what parts of the .deb control data it's using to do its thing. e.g. If I type cve-bin-tool mydebfile.deb
what's going to happen?
Hello @terriko |
I am supposed to get a list of the Deb_products to add in the script only after i run a test with DebParser using a test.deb file. I chose to use the test.deb file in the test/assets section. import sys
import os
sys.path.append('/home/joydeep/dev/cve-bin-tool')
from cve_bin_tool.parsers.deb import DebParser
from cve_bin_tool.cvedb import CVEDB
from cve_bin_tool.log import LOGGER
cve_db= CVEDB()
logger= LOGGER
file_path = os.path.join(os.getcwd(), 'test.deb')
deb_parser= DebParser(cve_db=cve_db,logger=logger)
deb_parser.run_checker(file_path) I have brought the test.deb file into the same directory as my testing file. I think there is something wrong in the way im calling Logger. Can you help me @terriko @anthonyharrison ? |
@crazytrain328 The test is OK in your local environment to prove the functionality.. However what we need is a test within the cve-bin-tool test environment where we can add it to the test suite. The language parsers all have test files in the test/language_data directory. Can you add your test.deb file in this directory and then update the test_language_scanner file to add your test code. I suggest you add a new test test_debian_package which calls the scanner and then asserts that the results are as expected. Can you confirm that the parser is doing more than is already covered in the extractor module and tested in the test_extractor file which explicitly has a a test for files with a .deb extension. |
@anthonyharrison |
I tried to change a few things but my local test still gives no output. My code for testing: import sys
import os
import logging # Import the logging module
sys.path.append('/home/joydeep/dev/cve-bin-tool')
from cve_bin_tool.parsers.deb import DebParser
from cve_bin_tool.cvedb import CVEDB
from cve_bin_tool.log import LOGGER
LOGGER.setLevel(logging.DEBUG)
cve_db = CVEDB()
logger = LOGGER
file_path = os.path.join(os.getcwd(), 'test.deb')
deb_parser = DebParser(cve_db=cve_db, logger=logger)
deb_parser.run_checker(file_path)
Modified run_checker() function: def run_checker(self, filename):
"""Process .deb control file with file existence check"""
self.logger.debug(f"Scanning .deb control file: {filename}")
# Check if the file exists
if not os.path.exists(filename):
self.logger.error(f"File not found: {filename}")
return # Exit the method if file doesn't exist
try:
with open(filename) as file:
control_data = file.read()
product, version = self.parse_control_file(control_data)
if product and version:
product_info = self.find_vendor(product, version)
if product_info:
yield from product_info
else:
self.logger.debug(f"No product/version found in {filename}")
except Exception as e:
self.logger.error(f"Error processing file {filename}: {e}")
self.logger.debug(f"Done scanning file: {filename}") Im stuck! Please help @terriko @anthonyharrison. |
@crazytrain328 Can you provide the test.deb file that you are using? Tried to run the parser in my environment. The run_checker routine wan't being called. However if I call a different routine in the class it does get called so I suspected there was something wrong with the way run-checker is defined/being called. I created the following routine def do_it(self, filename):
print ("DO IT")
try:
print ("Read file")
with open(filename) as file:
control_data = file.read()
print ("File read")
except:
print ("We have a problem")
print ("DONE IT") And called this instead of run_checker. This resulted in the exception being called when reading a .deb file (I used the test.deb file in the test/assets directory). Renaming this as run_checker, does result in the run_checker being called. So I think you need to work through the run_checker routine line by line to validate the operation. Using print statement rather than logging may also help. |
@anthonyharrison |
Have you updated the parse.py file? This calls the appropriate parser when
it finds a particular file e.g. requirements.txt will invoke the python
parser.
…On Fri, 8 Dec 2023, 08:59 Joydeep Tripathy, ***@***.***> wrote:
@anthonyharrison <https://github.com/anthonyharrison>
I am also using the test.deb in the test/assets directory.
But I will go through the run_checker() definition once again.
—
Reply to this email directly, view it on GitHub
<#3543 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAID23RI566OV5RRK3ETEDYILJFNAVCNFSM6AAAAAA74R5N46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBWG44TKNZQGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I was able to run the run_checker() function. I created my own .deb file and am trying to parse its control file with my DebParser. import os
import sys
import logging
sys.path.append('/home/joydeep/dev/cve-bin-tool')
from cve_bin_tool.parsers.deb import DebParser
from cve_bin_tool.cvedb import CVEDB
from cve_bin_tool.log import LOGGER
# Set logger to DEBUG level
LOGGER.setLevel(logging.DEBUG)
# Verify the test file path
file_path = '/home/joydeep/mypackage/DEBIAN/control'
if not os.path.exists(file_path):
raise FileNotFoundError(f"The file {file_path} does not exist.")
# Instantiate the database and the parser
cve_db = CVEDB()
deb_parser = DebParser(cve_db=cve_db, logger=LOGGER)
# Run the parser
try:
for info in deb_parser.run_checker(file_path):
print(info)
except Exception as e:
LOGGER.error(f"Exception occurred: {e}") Is there something wrong with the way Im using CVEDB? |
If I had to guess, the database problem is that your database hasn't been created or updated in a while. So you're making a new CVEDB() but you're not populating it. To update your local db and make sure it's functioning: (The file doesn't matter, I just chose one from our test suite so you can see if the database is working in other code.) And in your code you'd probably want to call |
How do I get the DEBIAN_PRODUCTS which i have to add in the |
Usually you'd make this manually (i.e. cut and paste the data that you used when you created the file). For example, if you look at https://github.com/intel/cve-bin-tool/blob/main/test/language_data/requirements.txt and then at the PYTHON_PRODUCTS array in https://github.com/intel/cve-bin-tool/blob/main/test/test_language_scanner.py you'll see that the test is just a subset of what could have been detected from the file. In your case, since a debian package often contains only one product, you may have an array that's just the one thing you put into the metadata of the file, so you could probably write something like def test_python_package(self, filename: str) -> None:
assert scanner.scan_file(filename) == "debian_package" Although you'll have to account for it returning an array rather than a single string or whatever it actually does (sorry, I've got to run to a meeting so I don't have time to double-check the api myself, but you can probably figure it out from the other tests!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer, but it looks like the test doesn't yet match what it's actually returning:
=========================== short test summary info ===========================
FAILED test/test_language_scanner.py::TestLanguageScanner::test_language_package[D:\\a\\cve-bin-tool\\cve-bin-tool\\test\\language_data\\test_control.deb-products9] - AssertionError: assert 'aptitude' in ['dpkg', 'apt', 'lintian']
FAILED test/test_language_scanner.py::TestLanguageScanner::test_debian_control[D:\\a\\cve-bin-tool\\cve-bin-tool\\test\\language_data\\test_control.deb] - AssertionError: assert ['dpkg', 'apt', 'lintian'] == ['dpkg', 'apt...debuild', ...]
At index 2 diff: 'lintian' != 'aptitude'
Right contains 7 more items, first extra item: 'debmake'
Full diff:
[
'dpkg',
'apt',
- 'aptitude',
- 'debmake',
'lintian',
- 'debuild',
- 'dh_make',
- 'debconf',
- 'alien',
- 'gdebi',
]
=========== 2 failed, 1869 passed, 32 skipped in 1232.78s (0:20:32) ===========
Note that there's failures only on windows due to an unrelated issue with the linux tests that I'm working on in #3631 -- if you run the linux tests on a local linux machine they'll probably fail too.
Oh, and if you want to run just your new test to see how it works on your system, you can use the -k option:
should probably get you just the new piece you added so you don't have to wait for a whole file worth of tests (or the whole test suite!) to complete. |
All the products that my test_control.deb has I have listed in the DEBIAN_PRODUCTS list ..I also modified the debparser code to be able to extract the products and their versions more efficiently, but still it does not give me the desired output. One thing I read about debian control files is that while the actual package has a .deb extension, the control file inside the package (which basically contains metadata about the debian package) is actually a text file (without extension). Should I write my tests to process a control.txt file? |
Typically, you'd want to have the test process a .deb and find and parse the control.txt, so... both? |
So I've been trying to find ways on how to unpack Debian packages using python and so far haven't had any luck in that . |
Hello @terriko , @anthonyharrison @b31ngd3v @Rexbeast2 |
Changes
Signed-off-by: Joydeep Tripathy <bntripathy123@gmail.com>
Only review and merge left. Eagerly Waiting for review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's taken so long to get back to this one. There's two discussions happening behind the scenes that affect this PR:
- The tarfile issue. We need to make sure that this code also extracts only the files needed and that it can't clobber any existing files. I think the best solution for you is to extract only the one specific file you need and no other files.
- Checking in binary files to our git repo. We've been doing a lot of this in the past and it's getting flagged by OpenSSF. I think the right choice in this case is have the test actually construct the .deb file that we'll use for testing.
I also think we need to do some thinking to make sure that we aren't duplicating effort or over-riding existing code that previously unpacked debian files and looks for strings. I'm honestly not sure what the right choice is to streamline that, but maybe it won't be so bad if this particular code is only extracting the control data so we're at least not unpacking two full archives into /tmp/
|
||
# Change the working directory to the temp_dir for extraction | ||
os.chdir(temp_dir) | ||
await aio_run_command(["ar", "x", filename]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We should check to make sure
ar
exists before we run it. - Is there any chance that ar could escape the temp directory? I know we're having this problem with tar but I don't know if ar has the same problem.
- If we're only using the control data to identify the .deb file, let's change the code to only extract that part and absolutely no other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ar
, as used in the aio_run_command() function, helps to extract the contents of the given debian package into a temporary directory, inside which we keep extracting till we reach the control file, then the contents of the file are written onto control_data
variable and then the temporary directory is closed, deleting all the extracted contents.
I think my code already does what you want it to.
As for why im using -x
to extract all the files inside the package - It is because the control files are sometimes present in different directories than one might expect it to . It might be present directly inside the debian package or maybe inside another tar file in the package. That is why all the files are being extracted here.
@@ -236,13 +238,15 @@ def test_language_package_none_found(self, filename: str) -> None: | |||
(str(TEST_FILE_PATH / "Package.resolved"), SWIFT_PRODUCTS), | |||
(str(TEST_FILE_PATH / "composer.lock"), PHP_PRODUCTS), | |||
(str(TEST_FILE_PATH / "cpanfile"), PERL_PRODUCTS), | |||
(str(TEST_FILE_PATH / "test.deb"), DEBIAN_PRODUCTS), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have test_language_scanner create the .deb file rather than checking it in to the test directory?
We've typically just put small test files in the directory and let it be, but it's hurting our OpenSSF score when we provide things like .deb packages that are basically installable (weirdly, it doesn't flag on the thousand .tar.gz files... yet).
I think .deb files use mostly tools we already have installed so it should be possible to write python or a makefile to generate the file here and add some code to skip the test if the file can't be built. Sorry that you get stuck as a guinea pig here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, I was working on this and was able to write something like this..
import os
import subprocess
def create_debian_package(directory, package_name, version, architecture, description, maintainer):
# Create the necessary directory structure
debian_dir = os.path.join(directory, 'DEBIAN')
os.makedirs(debian_dir, exist_ok=True)
# Create the control file
control_content = f"""Package: {package_name}
Version: {version}
Architecture: {architecture}
Maintainer: {maintainer}
Description: {description}
"""
with open(os.path.join(debian_dir, 'control'), 'w') as control_file:
control_file.write(control_content)
# Build the package
subprocess.run(['dpkg-deb', '--build', directory, f'{package_name}_{version}_{architecture}.deb'])
if __name__ == '__main__':
create_debian_package(
directory='mypackage',
package_name='mypackage',
version='1.0',
architecture='all',
description='Example package',
maintainer='Joydeep <mail@joydeep.com>'
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add this file in the current PR or as a different PR?
This one has been going on for way too long 🥲🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test.deb can of course be created for testing. But, would'nt that be like extra memory space?
Even if I create the debian package as a temporary file everytime, it would still be extra work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads up: this has a bunch of merge conflicts now and will need some work.
I'll get back to solving this as soon as we have the PURL generation for language parsers and their tests figured out. |
Marking this as blocked so I don't look at it again until after 3.3 is out. |
Sure! I did mention working on this issue as part of stretch goals in my GSOC project. Btw, When will the 3.3 version be coming out? @terriko |
Hello @terriko , Since the release is out I was thinking we can finally work on finishing this one. |
I've barely started with 3.3.1 planning so I expect this will get merged long before there, but it's going to be at least a few weeks. I'm severely backlogged on non-cve-bin-tool stuff at the moment and have to put my focus elsewhere. |
Absolutely no problem at all! |
As for testing files: for now, let's got with just including the .deb in git. I'm going to have to deal with the OpenSSF's insistence on there not being binary files eventually but I think at the moment it's more important to me that we have a functional test if it's not super easy to just have a makefile for it or something. |
So would the current structure work? |
closes #2917