Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fetching meta data from deps.dev #1457

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

n1ckl0sk0rtge
Copy link

Description

This PR adds basic capabilities to fetch meta data from deps.dev for a given PURL. With this code, only the related source code repositories will be fetched and stored.

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have updated the migration changelog accordingly
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

san-zrl and others added 4 commits August 19, 2024 10:18
Signed-off-by: san-zrl <san@zurich.ibm.com>
Signed-off-by: san-zrl <san@zurich.ibm.com>
Copy link

codacy-production bot commented Aug 20, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+18.16% (target: -1.00%) 87.50% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (1578370) 201 132 65.67%
Head commit (59c4b35) 6923 (+6722) 5804 (+5672) 83.84% (+18.16%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#1457) 48 42 87.50%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@nscuro
Copy link
Member

nscuro commented Aug 30, 2024

Thanks @n1ckl0sk0rtge! I've not forgotten about this PR, I'll try to get it reviewed this weekend! Apologies for the delay.

Comment on lines +59 to +64
/**
* {@inheritDoc}
*/
public RepositoryType supportedRepositoryType() {
return null; // Supported values for type are cargo, golang, maven, npm, nuget and pypi.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyzer would still need to be registered with RepositoryAnalyzerFactory, based on the PURL types it supports:

private static final Map<String, Supplier<IMetaAnalyzer>> ANALYZER_SUPPLIERS = Map.of(
PackageURL.StandardTypes.COMPOSER, ComposerMetaAnalyzer::new,
PackageURL.StandardTypes.GEM, GemMetaAnalyzer::new,
PackageURL.StandardTypes.GOLANG, GoModulesMetaAnalyzer::new,
PackageURL.StandardTypes.HEX, HexMetaAnalyzer::new,
PackageURL.StandardTypes.MAVEN, MavenMetaAnalyzer::new,
PackageURL.StandardTypes.NPM, NpmMetaAnalyzer::new,
PackageURL.StandardTypes.NUGET, NugetMetaAnalyzer::new,
PackageURL.StandardTypes.PYPI, PypiMetaAnalyzer::new,
PackageURL.StandardTypes.CARGO, CargoMetaAnalyzer::new,
"cpan", CpanMetaAnalyzer::new
);

The current model assumes at most one analyzer per PURL type though, so we'd need to adjust this to support multiple. Question then is, should one take priority over the other? Do we execute all applicable analyzers, and if so, how do we merge all results back into one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if perhaps we should switch to deps.dev entirely for all public components. The internal status is already provided to the repository meta analyzer as per Protobuf definition:

// Whether the component is internal to the organization.
// Internal components will only be looked up in internal repositories.
optional bool internal = 2;

In that case, we would not only source the repository from deps.dev, but also:

  • Latest version
  • Publish timestamp of latest version
  • Publish timestamp of current version
  • Hashes of current version (not sure if deps.dev provides that?)

I think that would be simpler than trying to support multiple analyzers per PURL.

Does that sound reasonable? Granted that change would be a bit larger.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nscuro took me a while to review. Yes, thats a good point. So do you think the Map can hold List<Supplier<IMetaAnalyzer>> and that we add some priority mechanism to IMetaAnalyzer? What we could also do is having DepsDevMetaAnalyzer as a base class for all the other MetaAnayzer. This would result in having a basic implementation for all PURL typs in DepsDevMetaAnalyzer which can be specialised and overridden. Then using the same ANALYZER_SUPPLIERS-map we can select and migrate step by step for each purl type.

The analyzer would still need to be registered with RepositoryAnalyzerFactory, based on the PURL types it supports:

private static final Map<String, Supplier<IMetaAnalyzer>> ANALYZER_SUPPLIERS = Map.of(
PackageURL.StandardTypes.COMPOSER, ComposerMetaAnalyzer::new,
PackageURL.StandardTypes.GEM, GemMetaAnalyzer::new,
PackageURL.StandardTypes.GOLANG, GoModulesMetaAnalyzer::new,
PackageURL.StandardTypes.HEX, HexMetaAnalyzer::new,
PackageURL.StandardTypes.MAVEN, MavenMetaAnalyzer::new,
PackageURL.StandardTypes.NPM, NpmMetaAnalyzer::new,
PackageURL.StandardTypes.NUGET, NugetMetaAnalyzer::new,
PackageURL.StandardTypes.PYPI, PypiMetaAnalyzer::new,
PackageURL.StandardTypes.CARGO, CargoMetaAnalyzer::new,
"cpan", CpanMetaAnalyzer::new
);

The current model assumes at most one analyzer per PURL type though, so we'd need to adjust this to support multiple. Question then is, should one take priority over the other? Do we execute all applicable analyzers, and if so, how do we merge all results back into one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we could also do is having DepsDevMetaAnalyzer as a base class for all the other MetaAnayzer. This would result in having a basic implementation for all PURL typs in DepsDevMetaAnalyzer which can be specialised and overridden. Then using the same ANALYZER_SUPPLIERS-map we can select and migrate step by step for each purl type.

Yes, I really like this idea!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we would not only source the repository from deps.dev, but also:

  • Latest version
  • Publish timestamp of latest version
  • Publish timestamp of current version
  • Hashes of current version (not sure if deps.dev provides that?)

DevDeps accepts purls with and without version tags but returns different content:
If the version is not part of the purl we get a json object containing information about all available versions. If the version is part of the purl, we get the information about the given version enriched with additional data such as SOURCE_REPO. From this we could extract

  • Given version
  • Publish timestamp of this version
  • version's source repo (if provided)

In DT we have purls with version tags. Finding out the latest version of a package would be complicated given the way DevDeps purlLookup works. We would have to query DevDeps twice, once without version tag and a second time with version tag. Probably not a good idea...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants