Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC 2119/8174 note and convert terms to uppercase as needed #373

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 65 additions & 59 deletions PURL-SPECIFICATION.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ Package URL specification v1.0.X
The Package URL core specification defines a versioned and formalized format,
syntax, and rules used to represent and validate ``purl``.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
`RFC 2119 <https://datatracker.ietf.org/doc/html/rfc2119>`_ and
clarified in `RFC 8174 <https://datatracker.ietf.org/doc/html/rfc8174>`_.

A ``purl`` or package URL is an attempt to standardize existing approaches to
reliably identify and locate software packages.

Expand Down Expand Up @@ -32,24 +38,24 @@ The definition for each components is:

- **scheme**: this is the URL scheme with the constant value of "pkg". One of
the primary reason for this single scheme is to facilitate the future official
registration of the "pkg" scheme for package URLs. Required.
registration of the "pkg" scheme for package URLs. REQUIRED.
- **type**: the package "type" or package "protocol" such as maven, npm, nuget,
gem, pypi, etc. Required.
gem, pypi, etc. REQUIRED.
- **namespace**: some name prefix such as a Maven groupid, a Docker image owner,
a GitHub user or organization. Optional and type-specific.
- **name**: the name of the package. Required.
- **version**: the version of the package. Optional.
a GitHub user or organization. OPTIONAL and type-specific.
Comment on lines 44 to +45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate. The namespace is part of the name, but for some reason documented separately from the name field, and it is always required, but sometimes must be empty. For example:

  • For pkg:nuget, the namespace MUST be empty
  • For pkg:maven, the namespace MUST NOT be empty
  • For pkg:npm, the namespace MUST be the part of the package name before the slash (aka the package scope), and failure to put the correct value in the namespace field changes the package referenced by the PURL
    So I guess this is accurate in terms of what PURL spec authors can write in the package types file, but incorrect for normal readers of this documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-phylum your comment make sense, but they apply to package types. Here @johnmhoran is not trying to change the spec, but just adopt the well defined words of the referenced RFCs.

The namespace is part of the name, but for some reason documented separately from the name field, and it is always required, but sometimes must be empty.

We should consider to merge namespace and name in one thing alright, but that's not the goal here. This will come later. And the namespace was never specified so far as being something possibly empty (with one exception which is a bug for the "mlflow" type https://github.com/package-url/purl-spec/blame/1951d217bde29590a73f075db4ab71cc00011459/PURL-TYPES.rst#L419 and I submitted a separate PR to remove this "empty" namespace #374 )

For pkg:nuget, the namespace MUST be empty

I'd rather say that there is no namespace at all for nuget.

There is no namespace per se even if the common convention is to use dot-separated package names where the first segment is namespace-like.

An empty namespace could mean something like this with really an "empty" namespace pkg:nuget//System.IO.FileSystem@4.3.0 which does not make sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PURL treats empty and missing namespaces the same. You can say that either all PURLs have a namespace, which may be empty, or you can say that the namespace cannot be empty and some PURLs do not have a namespace. It's not valid to encode an empty namespace as pkg:nuget//System.IO.FileSystem@4.3.0 according to the "How to build" section.

It's not just a problem with the types spec. This file says here that the namespace is optional, but in other places it says "if the namespace is not empty."

But anyway I'm trying to say that here the namespace is documented as being optional, but in the current spec it's either required to exist and be non-empty (Maven), required to not exist or be empty (NuGet), required to match a property of the package which may not exist or may be empty for certain packages (NPM). I feel like saying here that it's OPTIONAL without any qualifiers implies that for Maven you could leave it out or for NuGet you could add one.

I guess it would make more sense if PURL was more like URL where there are URL parsers that don't understand eg ftp://, but with the way the PURL spec is currently written, the line between supporting PURL and supporting a particular package type is blurred by requiring PURL parsing and building implementations to understand type specific rules for validating and normalizing the PURL.

- **name**: the name of the package. REQUIRED.
- **version**: the version of the package. OPTIONAL.
- **qualifiers**: extra qualifying data for a package such as an OS,
architecture, a distro, etc. Optional and type-specific.
architecture, a distro, etc. OPTIONAL and type-specific.
- **subpath**: extra subpath within a package, relative to the package root.
Optional.
OPTIONAL.


Components are designed such that they form a hierarchy from the most significant
on the left to the least significant components on the right.


A ``purl`` must NOT contain a URL Authority i.e. there is no support for
A ``purl`` MUST NOT contain a URL Authority i.e. there is no support for
``username``, ``password``, ``host`` and ``port`` components. A ``namespace`` segment may
sometimes look like a ``host`` but its interpretation is specific to a ``type``.

Expand Down Expand Up @@ -98,14 +104,14 @@ A ``purl`` is a URL
- Special URL schemes as defined in https://url.spec.whatwg.org/ such as
``file://``, ``https://``, ``http://`` and ``ftp://`` are NOT valid ``purl`` types.
They are valid URL or URI schemes but they are not ``purl``.
They may be used to reference URLs in separate attributes outside of a ``purl``
They MAY be used to reference URLs in separate attributes outside of a ``purl``
or in a ``purl`` qualifier.

- Version control system (VCS) URLs such ``git://``, ``svn://``, ``hg://`` or as
defined in Python pip or SPDX download locations are NOT valid ``purl`` types.
They are valid URL or URI schemes but they are not ``purl``.
They are a closely related, compact and uniform way to reference VCS URLs.
They may be used as references in separate attributes outside of a ``purl`` or
They MAY be used as references in separate attributes outside of a ``purl`` or
in a ``purl`` qualifier.


Expand All @@ -115,27 +121,27 @@ Rules for each ``purl`` component
A ``purl`` string is an ASCII URL string composed of seven components.

Some components are allowed to use other characters beyond ASCII: these
components must then be UTF-8-encoded strings and percent-encoded as defined in
components MUST then be UTF-8-encoded strings and percent-encoded as defined in
the "Character encoding" section.

The rules for each component are:

- **scheme**:

- The ``scheme`` is a constant with the value "pkg"
- Since a ``purl`` never contains a URL Authority, its ``scheme`` must not be
suffixed with double slash as in 'pkg://' and should use instead
- Since a ``purl`` never contains a URL Authority, its ``scheme`` MUST NOT be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in not accurate. A canonical PURL does not use pkg://, but PURLs MAY use pkg:, pkg:/, pkg://. See the explanation a few lines down which explicitly allows pkg://. This section should probably be rewritten because it starts by saying you MUST NOT do something ever and then later clarifies that the previous blanket prohibition only applies in one particular case and that in all other cases PURLs SHOULD NOT do it.

This is a concern I have in general with this kind of update, especially around the package types. The text has been written with a lot of unqualified "musts" and "must nots" that imply many things to be forbidden which are allowed by other parts of the spec and by every known implementation of the spec, which has been a point of contention with fixing pkg:golang. The current golang text says "must be lowercase", which is wrong because Go is and always has been case sensitive, and because the spec just says "must" it seems like people think generating a PURL with uppercase characters is always forbidden, not just for comparison or canonicalization purposes. Similarly, the current nuget text says "must not be lowercased", as if it's of particular importance that the name not be lowercased, since normally you wouldn't think to anyway, but this is also incorrect because the name must be lowercased during comparison.

There wasn't that much care taken to get all the "musts" and "must nots" etc correct when this text was written, so just upgrading all of them to "MUSTs" and "MUST NOTs" is making the language more strict and potentially confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-phylum You are making good points! I think we can make the spec crisper by separating:

  • A strict core spec, one that defines what a purl is and for instance here stating that a purl must start with pkg:, without further details.
  • And other (new) sections like an FAQ and recommendations to parser implementers that could explain here that a parser should be flexible and recover from some mistakes including handling pkg://

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-phylum The approach @pombredanne suggests is the approach we're taking in the first of our component-focused PRs, PR 361, which is focused on the 'scheme' component and includes inter alia the initial draft of an FAQ (faq.rst).

suffixed with double slash as in 'pkg://' and SHOULD use instead
'pkg:'. Otherwise this would be an invalid URI per rfc3986 at
https://tools.ietf.org/html/rfc3986#section-3.3::

If a URI does not contain an authority component, then the path
cannot begin with two slash characters ("//").

It is therefore incorrect to use such '://' scheme suffix as the URL would
no longer be valid otherwise. In its canonical form, a ``purl`` must
no longer be valid otherwise. In its canonical form, a ``purl`` MUST
NOT use such '://' ``scheme`` suffix but only ':' as a ``scheme`` suffix.
- ``purl`` parsers must accept URLs such as 'pkg://' and must ignore the '//'.
- ``purl`` builders must not create invalid URLs with such double slash '//'.
- ``purl`` parsers MUST accept URLs such as 'pkg://' and MUST ignore the '//'.
- ``purl`` builders MUST NOT create invalid URLs with such double slash '//'.
- The ``scheme`` is followed by a ':' separator
- For example these two purls are strictly equivalent and the first is in
canonical form. The second ``purl`` with a '//' is an acceptable ``purl`` but is
Expand All @@ -151,23 +157,23 @@ The rules for each component are:
and '-' (period, plus, and dash)
- The ``type`` cannot start with a number
- The ``type`` cannot contain spaces
- The ``type`` must NOT be percent-encoded
- The ``type`` MUST NOT be percent-encoded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should probably be rewritten because it mixes requirements for PURL spec authors and PURL implementations.

This point in particular is problematic. The previous points establish that spec authors must not introduce a type which would require percent encoding. However, if an implementation receives a type containing characters that require percent encoding, either they MUST be percent-encoded (in direct contradiction to this point) or the implementation MUST return an error (during formatting and parsing). Simply implementing "MUST NOT be percent-encoded" is incorrect.

If a PURL is created with package type "a/b":

  • althonos/packageurl.rs, package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-php, package-url/packageurl-python, package-url/packageurl-ruby, package-url/packageurl-swift: an incorrect PURL is created with "b" prefixed to the namespace (pkg:a/b/c)

If a PURL pkg:a%2Fb/c is parsed:

  • althonos/packageurl.rs, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, package-url/packageurl-php, package-url-packageurl-ruby, package-url/packageurl-swift: a PURL with type "a%2fb" is parsed
  • package-url/packageurl-python: a PURL with type "a/b" is parsed

- The ``type`` is case insensitive. The canonical form is lowercase


- **namespace**:

- The optional ``namespace`` contains zero or more segments, separated by slash
- The OPTIONAL ``namespace`` contains zero or more segments, separated by slash
'/'
- Leading and trailing slashes '/' are not significant and should be stripped
- Leading and trailing slashes '/' are not significant and SHOULD be stripped
in the canonical form. They are not part of the ``namespace``
- Each ``namespace`` segment must be a percent-encoded string
- Each ``namespace`` segment MUST be a percent-encoded string
- When percent-decoded, a segment:

- must not contain a '/'
- must not be empty
- MUST NOT contain a '/'
- MUST NOT be empty

- A URL host or Authority must NOT be used as a ``namespace``. Use instead a
- A URL host or Authority MUST NOT be used as a ``namespace``. Use instead a
``repository_url`` qualifier. Note however that for some types, the
``namespace`` may look like a host.

Expand All @@ -176,18 +182,18 @@ The rules for each component are:

- The ``name`` is prefixed by a '/' separator when the ``namespace`` is not empty
- This '/' is not part of the ``name``
- A ``name`` must be a percent-encoded string
- A ``name`` MUST be a percent-encoded string


- **version**:

- The ``version`` is prefixed by a '@' separator when not empty
- This '@' is not part of the ``version``
- A ``version`` must be a percent-encoded string
- A ``version`` MUST be a percent-encoded string

- A ``version`` is a plain and opaque string. Some package ``types`` use versioning
conventions such as SemVer for NPMs or NEVRA conventions for RPMS. A ``type``
may define a procedure to compare and sort versions, but there is no
MAY define a procedure to compare and sort versions, but there is no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this one to "MAY" doesn't make sense to me because it's permitting PURL spec authors to write something meaningless in another part of the spec. This whole sentence doesn't really make sense to me either way, but it makes slightly more sense with "may." AFAIK, no type does define a procedure for comparing and sorting versions, and I don't think there's any plan or reason for them to do so. Most types use versions that have at least partial ordering, but some types use versions that are completely unordered (or at least you cannot determine the order without additional information).

reliable and uniform way to do such comparison consistently.


Expand All @@ -199,18 +205,18 @@ The rules for each component are:
separated by a '&' ampersand. A ``key`` and ``value`` are separated by the equal
'=' character
- These '&' are not part of the ``key=value`` pairs.
- ``key`` must be unique within the keys of the ``qualifiers`` string
- ``key`` MUST be unique within the keys of the ``qualifiers`` string
- ``value`` cannot be an empty string: a ``key=value`` pair with an empty ``value``
is the same as no key/value at all for this key
- For each pair of ``key`` = ``value``:

- The ``key`` must be composed only of ASCII letters and numbers, '.', '-' and
- The ``key`` MUST be composed only of ASCII letters and numbers, '.', '-' and
'_' (period, dash and underscore)
- A ``key`` cannot start with a number
- A ``key`` must NOT be percent-encoded
- A ``key`` MUST NOT be percent-encoded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the problem with types, either the key MUST be percent-encoded or implementations MUST return an error. "MUST NOT be percent-encoded" is incorrect behavior for implementations.

When a purl with qualifier "c&d" is formatted:

  • althonos/packageurl.rs, package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, package-url/packageurl-python, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, package-url/packageurl-php, package-url/packageurl-ruby, package-url/packageurl-swift: an incorrect PURL with two qualifiers is produced (pkg:a/b?c&d=e)
  • maennchen/purl: a correct, but forbidden, PURL is produced (pkg:a/b?c%26d=e)

When receiving a purl pkg:a/b?c%26d=e:

  • althonos/packageurl.rs, anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, package-url/packageurl-python, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • package-url/packageurl-dotnet, package-url/packageurl-php, package-url/packageurl-ruby: an incorrect qualifier "c%26d" is parsed
  • package-url/packageurl-swift: the correct, but forbidden, qualifier "c&d" is parsed

Interestingly, there is no overlap between implementations that allow and correctly handle the forbidden characters during formatting and implementations that allow and correctly handle the forbidden characters during parsing.

There's another case for parsing pkg:a/b?%63=%64:

  • althonos/packageurl.rs, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-dotnet, package-url/packageurl-java, package-url/packageurl-python, phylum-dev/purl, sonatype/package-url-java: an error is returned
  • anchore/packageurl-go, package-url/packageurl-go, package-url/packageurl-js, package-url/packageurl-swift: c=d
  • package-url/packageurl-dotnet, package-url/packageurl-php, package-url/packageurl-ruby: %63=d

This is probably another nonconformance caused by PURL parsers that are based on URL parsers. The URL parser unescapes the %63 into a valid character before the PURL parser has a chance to reject it.

- A ``key`` is case insensitive. The canonical form is lowercase
- A ``key`` cannot contain spaces
- A ``value`` must be a percent-encoded string
- A ``value`` MUST be a percent-encoded string
- The '=' separator is neither part of the ``key`` nor of the ``value``


Expand All @@ -219,50 +225,50 @@ The rules for each component are:
- The ``subpath`` string is prefixed by a '#' separator when not empty
- This '#' is not part of the ``subpath``
- The ``subpath`` contains zero or more segments, separated by slash '/'
- Leading and trailing slashes '/' are not significant and should be stripped
- Leading and trailing slashes '/' are not significant and SHOULD be stripped
in the canonical form
- Each ``subpath`` segment must be a percent-encoded string
- Each ``subpath`` segment MUST be a percent-encoded string
- When percent-decoded, a segment:

- must not contain a '/'
- must not be any of '..' or '.'
- must not be empty
- MUST NOT contain a '/'
- MUST NOT be any of '..' or '.'
- MUST NOT be empty

- The ``subpath`` must be interpreted as relative to the root of the package
- The ``subpath`` MUST be interpreted as relative to the root of the package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subpath is relative to the root of the package. Saying that it must be interpreted as what it is doesn't make sense.

However, "root of the package" is not well defined. What is the root of a pkg:pypi package? Is it the root of the archive? Is it the root of the folder within the archive (an sdist always contains a single folder with an arbitrary name at the root which is discarded during installation)? Is it the root of the Python module (inside the sdist's first folder, there is normally exactly one folder, with a name similar to that of the package, which is installed into the lib directory). For a Python wheel, the the top level folder from the sdist is omitted. For NPM, the archive also contains a single top level folder (normally named "package"), and then inside that folder are the files that an NPM developer would consider to be in the root.



Character encoding
~~~~~~~~~~~~~~~~~~

For clarity and simplicity a ``purl`` is always an ASCII string. To ensure that
there is no ambiguity when parsing a ``purl``, separator characters and non-ASCII
characters must be UTF-encoded and then percent-encoded as defined at::
characters MUST be UTF-encoded and then percent-encoded as defined at::

https://en.wikipedia.org/wiki/Percent-encoding

Use these rules for percent-encoding and decoding ``purl`` components:

- the ``type`` must NOT be encoded and must NOT contain separators
- the ``type`` MUST NOT be encoded and MUST NOT contain separators

- the '#', '?', '@' and ':' characters must NOT be encoded when used as
separators. They may need to be encoded elsewhere
- the '#', '?', '@' and ':' characters MUST NOT be encoded when used as
separators. They MAY need to be encoded elsewhere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this usage of "MAY" makes sense. The spec is not permitting the implementation to just arbitrarily percent encode this characters at random. Technically that is permitted, but only in non-canonical PURLs, and only in parts of the PURL which do not forbid percent encoding.


- the ':' ``scheme`` and ``type`` separator does not need to and must NOT be encoded.
- the ':' ``scheme`` and ``type`` separator does not need to and MUST NOT be encoded.
It is unambiguous unencoded everywhere

- the '/' used as ``type``/``namespace``/``name`` and ``subpath`` segments separator
does not need to and must NOT be percent-encoded. It is unambiguous unencoded
does not need to and MUST NOT be percent-encoded. It is unambiguous unencoded
everywhere

- the '@' ``version`` separator must be encoded as ``%40`` elsewhere
- the '?' ``qualifiers`` separator must be encoded as ``%3F`` elsewhere
- the '=' ``qualifiers`` key/value separator must NOT be encoded
- the '#' ``subpath`` separator must be encoded as ``%23`` elsewhere
- the '@' ``version`` separator MUST be encoded as ``%40`` elsewhere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are inaccurate when they say that encoding is required. It MUST be encoded in canonical PURLs, but in general it MUST be encoded if it is the right-most @ and isn't the version separator and SHOULD be encoded otherwise. Parser implementations are required to handle the case where the PURL contains unencoded @ characters as described in the parsing algorithm section.

- the '?' ``qualifiers`` separator MUST be encoded as ``%3F`` elsewhere
- the '=' ``qualifiers`` key/value separator MUST NOT be encoded
- the '#' ``subpath`` separator MUST be encoded as ``%23`` elsewhere

- All non-ASCII characters must be encoded as UTF-8 and then percent-encoded
- All non-ASCII characters MUST be encoded as UTF-8 and then percent-encoded

It is OK to percent-encode ``purl`` components otherwise except for the ``type``.
Parsers and builders must always percent-decode and percent-encode ``purl``
Parsers and builders MUST always percent-decode and percent-encode ``purl``
components and component segments as explained in the "How to parse" and "How to
build" sections.

Expand All @@ -273,7 +279,7 @@ How to build ``purl`` string from its components
Building a ``purl`` ASCII string works from left to right, from ``type`` to
``subpath``.

Note: some extra type-specific normalizations are required.
Note: some extra type-specific normalizations are REQUIRED.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impossible requirement if the addition of new types is considered to be not a breaking change. More practically, normalization is REQUIRED during canonicalization and comparison, but otherwise implementations should do their best.

See the "Known types section" for details.

To build a ``purl`` string from its components:
Expand Down Expand Up @@ -348,7 +354,7 @@ How to parse a ``purl`` string in its components
Parsing a ``purl`` ASCII string into its components works from right to left,
from ``subpath`` to ``type``.

Note: some extra type-specific normalizations are required.
Note: some extra type-specific normalizations are REQUIRED.
See the "Known types section" for details.

To parse a ``purl`` string in its components:
Expand Down Expand Up @@ -427,11 +433,11 @@ Known ``qualifiers`` key/value pairs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Note: Do not abuse ``qualifiers``: it can be tempting to use many qualifier
keys but their usage should be limited to the bare minimum for proper package
keys but their usage SHOULD be limited to the bare minimum for proper package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an instruction to PURL spec authors. PURL implementations shouldn't be removing qualifiers to make the PURL more compact.

identification to ensure that a ``purl`` stays compact and readable in most cases.

Additional, separate external attributes stored outside of a ``purl`` are the
preferred mechanism to convey extra long and optional information such as a
preferred mechanism to convey extra long and OPTIONAL information such as a
download URL, VCS URL or checksums in an API, database or web form.


Expand All @@ -440,15 +446,15 @@ all package types:

- ``repository_url`` is an extra URL for an alternative, non-default package
repository or registry. When a package does not come from the default public
package repository for its ``type`` a ``purl`` may be qualified with this extra
package repository for its ``type`` a ``purl`` MAY be qualified with this extra
URL. The default repository or registry of a ``type`` is documented in the
"Known ``purl`` types" section.

- ``download_url`` is an extra URL for a direct package web download URL to
optionally qualify a ``purl``.

- ``vcs_url`` is an extra URL for a package version control system URL to
optionally qualify a ``purl``. The syntax for this URL should be as defined in
optionally qualify a ``purl``. The syntax for this URL SHOULD be as defined in
Python pip or the SPDX specification. See
https://github.com/spdx/spdx-spec/blob/cfa1b9d08903/chapters/3-package-information.md#37-package-download-location

Expand Down Expand Up @@ -482,22 +488,22 @@ pairs some of which may not be normalized:
- **qualifiers**: the ``qualifiers`` corresponding to this ``purl`` as an object of
{key: value} qualifier pairs.
- **subpath**: the ``subpath`` corresponding to this ``purl``.
- **is_invalid**: a boolean flag set to true if the test should report an
- **is_invalid**: a boolean flag set to true if the test SHOULD report an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The test MUST report an error if the error is related to general PURL problems or if the error is related to a package type problem for a package type that is supported by the implementation being tested.

error

To test ``purl`` parsing and building, a tool can use this test suite and for
every listed test object, run these tests:

- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
components should return the test canonical ``purl``
components SHOULD return the test canonical ``purl``

- parsing the test ``purl`` should return the components parsed from the test
- parsing the test ``purl`` SHOULD return the components parsed from the test
canonical ``purl``

- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
should return the test canonical ``purl``
SHOULD return the test canonical ``purl``

- building a ``purl`` from the test components should return the test canonical ``purl``
- building a ``purl`` from the test components SHOULD return the test canonical ``purl``
Comment on lines 497 to +506
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are qualified MUSTs. The implementation MUST return the value specified by the test if the implementation supports the package type used by that test. The test suite could use some work to separate the behaviors that any PURL implementation must satisfy from the behaviors that PURL implementations supporting certain package types must satisfy.



License
Expand Down