Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC 2119/8174 note and convert terms to uppercase as needed #373

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

johnmhoran
Copy link
Member

As part of the core spec updating process, I've created a new branch and PR to address the proposed adoption and implementation of RFC 2119, as clarified by RFC 8174.

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
@johnmhoran johnmhoran added the Ecma specification Work on the core specification label Dec 24, 2024
Comment on lines 44 to +45
- **namespace**: some name prefix such as a Maven groupid, a Docker image owner,
a GitHub user or organization. Optional and type-specific.
- **name**: the name of the package. Required.
- **version**: the version of the package. Optional.
a GitHub user or organization. OPTIONAL and type-specific.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate. The namespace is part of the name, but for some reason documented separately from the name field, and it is always required, but sometimes must be empty. For example:

  • For pkg:nuget, the namespace MUST be empty
  • For pkg:maven, the namespace MUST NOT be empty
  • For pkg:npm, the namespace MUST be the part of the package name before the slash (aka the package scope), and failure to put the correct value in the namespace field changes the package referenced by the PURL
    So I guess this is accurate in terms of what PURL spec authors can write in the package types file, but incorrect for normal readers of this documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-phylum your comment make sense, but they apply to package types. Here @johnmhoran is not trying to change the spec, but just adopt the well defined words of the referenced RFCs.

The namespace is part of the name, but for some reason documented separately from the name field, and it is always required, but sometimes must be empty.

We should consider to merge namespace and name in one thing alright, but that's not the goal here. This will come later. And the namespace was never specified so far as being something possibly empty (with one exception which is a bug for the "mlflow" type https://github.com/package-url/purl-spec/blame/1951d217bde29590a73f075db4ab71cc00011459/PURL-TYPES.rst#L419 and I submitted a separate PR to remove this "empty" namespace #374 )

For pkg:nuget, the namespace MUST be empty

I'd rather say that there is no namespace at all for nuget.

There is no namespace per se even if the common convention is to use dot-separated package names where the first segment is namespace-like.

An empty namespace could mean something like this with really an "empty" namespace pkg:nuget//System.IO.FileSystem@4.3.0 which does not make sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PURL treats empty and missing namespaces the same. You can say that either all PURLs have a namespace, which may be empty, or you can say that the namespace cannot be empty and some PURLs do not have a namespace. It's not valid to encode an empty namespace as pkg:nuget//System.IO.FileSystem@4.3.0 according to the "How to build" section.

It's not just a problem with the types spec. This file says here that the namespace is optional, but in other places it says "if the namespace is not empty."

But anyway I'm trying to say that here the namespace is documented as being optional, but in the current spec it's either required to exist and be non-empty (Maven), required to not exist or be empty (NuGet), required to match a property of the package which may not exist or may be empty for certain packages (NPM). I feel like saying here that it's OPTIONAL without any qualifiers implies that for Maven you could leave it out or for NuGet you could add one.

I guess it would make more sense if PURL was more like URL where there are URL parsers that don't understand eg ftp://, but with the way the PURL spec is currently written, the line between supporting PURL and supporting a particular package type is blurred by requiring PURL parsing and building implementations to understand type specific rules for validating and normalizing the PURL.

the "Character encoding" section.

The rules for each component are:

- **scheme**:

- The ``scheme`` is a constant with the value "pkg"
- Since a ``purl`` never contains a URL Authority, its ``scheme`` must not be
suffixed with double slash as in 'pkg://' and should use instead
- Since a ``purl`` never contains a URL Authority, its ``scheme`` MUST NOT be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in not accurate. A canonical PURL does not use pkg://, but PURLs MAY use pkg:, pkg:/, pkg://. See the explanation a few lines down which explicitly allows pkg://. This section should probably be rewritten because it starts by saying you MUST NOT do something ever and then later clarifies that the previous blanket prohibition only applies in one particular case and that in all other cases PURLs SHOULD NOT do it.

This is a concern I have in general with this kind of update, especially around the package types. The text has been written with a lot of unqualified "musts" and "must nots" that imply many things to be forbidden which are allowed by other parts of the spec and by every known implementation of the spec, which has been a point of contention with fixing pkg:golang. The current golang text says "must be lowercase", which is wrong because Go is and always has been case sensitive, and because the spec just says "must" it seems like people think generating a PURL with uppercase characters is always forbidden, not just for comparison or canonicalization purposes. Similarly, the current nuget text says "must not be lowercased", as if it's of particular importance that the name not be lowercased, since normally you wouldn't think to anyway, but this is also incorrect because the name must be lowercased during comparison.

There wasn't that much care taken to get all the "musts" and "must nots" etc correct when this text was written, so just upgrading all of them to "MUSTs" and "MUST NOTs" is making the language more strict and potentially confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-phylum You are making good points! I think we can make the spec crisper by separating:

  • A strict core spec, one that defines what a purl is and for instance here stating that a purl must start with pkg:, without further details.
  • And other (new) sections like an FAQ and recommendations to parser implementers that could explain here that a parser should be flexible and recover from some mistakes including handling pkg://

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-phylum The approach @pombredanne suggests is the approach we're taking in the first of our component-focused PRs, PR 361, which is focused on the 'scheme' component and includes inter alia the initial draft of an FAQ (faq.rst).

@@ -151,23 +157,23 @@ The rules for each component are:
and '-' (period, plus, and dash)
- The ``type`` cannot start with a number
- The ``type`` cannot contain spaces
- The ``type`` must NOT be percent-encoded
- The ``type`` MUST NOT be percent-encoded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should probably be rewritten because it mixes requirements for PURL spec authors and PURL implementations.

This point in particular is problematic. The previous points establish that spec authors must not introduce a type which would require percent encoding. However, if an implementation receives a type containing characters that require percent encoding, either they MUST be percent-encoded (in direct contradiction to this point) or the implementation MUST return an error (during formatting and parsing). Simply implementing "MUST NOT be percent-encoded" is incorrect.

If a PURL is created with package type "a/b":

  • althonos/packageurl.rs, package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-php, package-url/packageurl-python, package-url/packageurl-ruby, package-url/packageurl-swift: an incorrect PURL is created with "b" prefixed to the namespace (pkg:a/b/c)

If a PURL pkg:a%2Fb/c is parsed:

  • althonos/packageurl.rs, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, package-url/packageurl-php, package-url-packageurl-ruby, package-url/packageurl-swift: a PURL with type "a%2fb" is parsed
  • package-url/packageurl-python: a PURL with type "a/b" is parsed


- A ``version`` is a plain and opaque string. Some package ``types`` use versioning
conventions such as SemVer for NPMs or NEVRA conventions for RPMS. A ``type``
may define a procedure to compare and sort versions, but there is no
MAY define a procedure to compare and sort versions, but there is no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this one to "MAY" doesn't make sense to me because it's permitting PURL spec authors to write something meaningless in another part of the spec. This whole sentence doesn't really make sense to me either way, but it makes slightly more sense with "may." AFAIK, no type does define a procedure for comparing and sorting versions, and I don't think there's any plan or reason for them to do so. Most types use versions that have at least partial ordering, but some types use versions that are completely unordered (or at least you cannot determine the order without additional information).

'_' (period, dash and underscore)
- A ``key`` cannot start with a number
- A ``key`` must NOT be percent-encoded
- A ``key`` MUST NOT be percent-encoded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the problem with types, either the key MUST be percent-encoded or implementations MUST return an error. "MUST NOT be percent-encoded" is incorrect behavior for implementations.

When a purl with qualifier "c&d" is formatted:

  • althonos/packageurl.rs, package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, package-url/packageurl-python, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, package-url/packageurl-php, package-url/packageurl-ruby, package-url/packageurl-swift: an incorrect PURL with two qualifiers is produced (pkg:a/b?c&d=e)
  • maennchen/purl: a correct, but forbidden, PURL is produced (pkg:a/b?c%26d=e)

When receiving a purl pkg:a/b?c%26d=e:

  • althonos/packageurl.rs, anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, package-url/packageurl-python, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • package-url/packageurl-dotnet, package-url/packageurl-php, package-url/packageurl-ruby: an incorrect qualifier "c%26d" is parsed
  • package-url/packageurl-swift: the correct, but forbidden, qualifier "c&d" is parsed

Interestingly, there is no overlap between implementations that allow and correctly handle the forbidden characters during formatting and implementations that allow and correctly handle the forbidden characters during parsing.

There's another case for parsing pkg:a/b?%63=%64:

  • althonos/packageurl.rs, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-dotnet, package-url/packageurl-java, package-url/packageurl-python, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, package-url/packageurl-go, package-url/packageurl-js, package-url/packageurl-swift: c=d
  • package-url/packageurl-dotnet, package-url/packageurl-php, package-url/packageurl-ruby: %63=d

This is probably another nonconformance caused by PURL parsers that are based on URL parsers. The URL parser unescapes the %63 into a valid character before the PURL parser has a chance to reject it.

- the '?' ``qualifiers`` separator must be encoded as ``%3F`` elsewhere
- the '=' ``qualifiers`` key/value separator must NOT be encoded
- the '#' ``subpath`` separator must be encoded as ``%23`` elsewhere
- the '@' ``version`` separator MUST be encoded as ``%40`` elsewhere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are inaccurate when they say that encoding is required. It MUST be encoded in canonical PURLs, but in general it MUST be encoded if it is the right-most @ and isn't the version separator and SHOULD be encoded otherwise. Parser implementations are required to handle the case where the PURL contains unencoded @ characters as described in the parsing algorithm section.

@@ -273,7 +279,7 @@ How to build ``purl`` string from its components
Building a ``purl`` ASCII string works from left to right, from ``type`` to
``subpath``.

Note: some extra type-specific normalizations are required.
Note: some extra type-specific normalizations are REQUIRED.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impossible requirement if the addition of new types is considered to be not a breaking change. More practically, normalization is REQUIRED during canonicalization and comparison, but otherwise implementations should do their best.

@@ -427,11 +433,11 @@ Known ``qualifiers`` key/value pairs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Note: Do not abuse ``qualifiers``: it can be tempting to use many qualifier
keys but their usage should be limited to the bare minimum for proper package
keys but their usage SHOULD be limited to the bare minimum for proper package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an instruction to PURL spec authors. PURL implementations shouldn't be removing qualifiers to make the PURL more compact.

@@ -482,22 +488,22 @@ pairs some of which may not be normalized:
- **qualifiers**: the ``qualifiers`` corresponding to this ``purl`` as an object of
{key: value} qualifier pairs.
- **subpath**: the ``subpath`` corresponding to this ``purl``.
- **is_invalid**: a boolean flag set to true if the test should report an
- **is_invalid**: a boolean flag set to true if the test SHOULD report an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The test MUST report an error if the error is related to general PURL problems or if the error is related to a package type problem for a package type that is supported by the implementation being tested.

Comment on lines 497 to +506
- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
components should return the test canonical ``purl``
components SHOULD return the test canonical ``purl``

- parsing the test ``purl`` should return the components parsed from the test
- parsing the test ``purl`` SHOULD return the components parsed from the test
canonical ``purl``

- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
should return the test canonical ``purl``
SHOULD return the test canonical ``purl``

- building a ``purl`` from the test components should return the test canonical ``purl``
- building a ``purl`` from the test components SHOULD return the test canonical ``purl``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are qualified MUSTs. The implementation MUST return the value specified by the test if the implementation supports the package type used by that test. The test suite could use some work to separate the behaviors that any PURL implementation must satisfy from the behaviors that PURL implementations supporting certain package types must satisfy.

pombredanne added a commit that referenced this pull request Dec 24, 2024
mlflow used a reference to an "empty" namespace and was at
odds to use this definition with other types.
This clarifies that mlflow does not use a namespace and remove the reference to "empty" namespace.

Reference: #373
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ecma specification Work on the core specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants