Pathor is a PHP library for normalizing, analyzing, and comparing URLs. It is built on top of the League\Uri library and offers an easy-to-use API for common URL-related operations.
Install the library via Composer:
composer require pathor/url
- Normalize URLs by standardizing components (scheme, host, path, query, etc.).
- Generate a consistent fingerprint (hash) for URLs.
- Compare multiple URLs to check if they are equivalent.
- Parse URLs into their individual components.
- Assemble URLs from their components.
- Customize normalization with handlers and configurations.
Here is a quick example of how to use the Pathor library:
use Pathor\Url;
$pathor = new Url;
$url = 'https://www.example.com/path///../a/b/../c//Ρ//hello world/?ref=google&b=2&a=1&&=&&foo[1]=222&foo[0]=111#hello world';
// Normalize URL
$normalizedUrl = $pathor->normalize($url);
dd($normalizedUrl); // https://www.example.com/path/a/c/%D1%91/hello%20world?a=1&b=2&foo%5B%5D=111&foo%5B%5D=222#hello%20world
// Generate fingerprint
$fingerprint = $pathor->fingerprint($url);
dd($fingerprint); // b18e86f5d2da88269fd0895af1178d8305ae78fe3fa3e61195af6b50a60f333d
// Compare URLs
$isEqual = $pathor->equals(
'https://www.example.com/path/a/c/%D1%91/hello%20world?a=1&b=2&foo%5B%5D=111&foo%5B%5D=222#hello%20world',
'https://www.example.com/path///../a/b/../c//Ρ//hello world/?ref=google&b=2&a=1&&=&&foo[1]=222&foo[0]=111#hello world',
'https://www.example.com/path//a/b/../c//Ρ//hello world/?ref=google&b=2&a=1&&=&&&foo[]=111&foo[]=222#hello world',
);
dd($isEqual); // Outputs: bool(true)
// Get URL details
$details = $pathor->details($url);
dd($details); // Outputs an array with parsed and normalized components
Examples can be found here.
The Url
class can be customized with configuration options to adjust the normalization behavior. These options include:
fingerprint
: Set the hashing algorithm for URL fingerprints (default:sha256
).query
: Customize query string handling.withoutDuplicates
: Remove duplicate query parameters.withoutEmptyPairs
: Remove empty query parameters.withSortedParams
: Sort query parameters alphabetically.withoutTrackingParams
: Remove known tracking parameters (e.g.,utm_source
).
path
: Customize path normalization.withoutDotSegments
: Remove.
and..
segments in the path.withoutEmptySegments
: Remove empty segments from the path.withoutTrailingSlash
: Remove trailing slashes.
$config = [
'fingerprint' => 'sha256', // https://www.php.net/manual/en/function.hash-algos.php
'query' => [
'withoutDuplicates' => true,
'withoutEmptyPairs' => true,
'withoutNumericIndices' => true,
'withSortedParams' => true,
'withoutTrackingParams' => true,
'trackingParamsList' => static::QUERY_TRACKING_PARAMS,
],
'path' => [
'withoutDotSegments' => true,
'withoutEmptySegments' => true,
'withoutTrailingSlash' => true,
],
];
$pathor = new Url($config);
Custom handlers allow you to define specific rules for processing URL components. Handlers are functions that take the original and normalized values as parameters.
Example:
$handlers = [
'scheme' => fn(?string $normalized, ?string $original): ?string => $normalized,
'user' => fn(?string $normalized, ?string $original): ?string => $normalized,
'password' => fn(?string $normalized, ?string $original): ?string => $normalized,
'host' => fn(?string $normalized, ?string $original): ?string => strtoupper($original),
'port' => fn(?int $normalized, ?int $original): ?int => $normalized,
'path' => fn(?string $normalized, ?string $original): ?string => $normalized,
'query' => fn(?string $normalized, ?string $original): ?string => $normalized,
'fragment' => fn(?string $normalized, ?string $original): ?string => $normalized,
];
$pathor = new Url(handlers: $handlers);
Normalizes a given URL by standardizing its components. By default, this includes:
- Lowercasing the scheme and host.
- Remove duplicate query parameters.
- Remove empty query parameters.
- Sort query parameters alphabetically.
- Remove known tracking parameters (e.g.,
utm_source
). - Remove
.
and..
segments in the path. - Remove empty segments from the path.
- Remove trailing slashes.
- And more.
Example:
$normalized = $pathor->normalize('HTTP://Example.COM/../a/B/./');
echo $normalized; // Outputs: http://example.com/a/B
$normalized = $pathor->normalize('https://ΡΠ°ΠΉΡ.ΡΡ');
echo $normalized; // Outputs: https://xn--80aswg.xn--p1ai
Generates a hash based on the normalized URL. The hashing algorithm can be configured.
Example:
$fingerprint = $pathor->fingerprint('https://example.com/path?param=value');
echo $fingerprint; // Outputs a hash string (e.g., SHA256)
Compares two or more URLs to check if they are equivalent after normalization. Throws an exception if less than two URLs are provided.
Example:
$areEqual = $pathor->equals(
'https://example.com/?utm_source=google',
'https://example.com:443?ref=site&=',
'https://example.com:443/',
'https://example.com:443/?#',
'https://example.com:443'
);
var_dump($areEqual); // Outputs: bool(true)
Breaks a URL into its components, returning an associative array.
Example:
$components = $pathor->parse('https://user:pass@example.com:8080/path?query=value#fragment');
dd($components);
// ^ array:8 [
// "scheme" => "https"
// "host" => "example.com"
// "user" => "user"
// "password" => "pass"
// "port" => 8080
// "path" => "/path"
// "query" => "query=value"
// "fragment" => "fragment"
// ]
Assembles a URL from its components. Accepts an associative array with keys like scheme
, host
, path
, etc.
Example:
$url = $pathor->build([
'scheme' => 'https',
'host' => 'example.com',
'path' => 'new-path',
'query' => ['param' => 'value'], // or string (http_build_query)
'fragment' => 'section'
]);
echo $url; // Outputs: https://example.com/new-path?param=value#section
Returns a detailed breakdown of a normalized URL, including original and modified components.
Example:
$details = $pathor->details('https://www.example.com:443/path///../a/b/../c//Ρ//hello world/?ref=google&b=2&a=1&&=&&foo[1]=222&foo[0]=111#hello world');
dd($details);
// ^ array:4 [
// "fingerprint" => "4c64095f06900806842e22f93ee151ab"
// "original_url" => "https://www.example.com:443/path///../a/b/../c//Ρ//hello world/?ref=google&b=2&a=1&&=&&foo[1]=222&foo[0]=111#hello world"
// "normalized_url" => "https://www.example.com/path/a/c/%D1%91/hello%20world?a=1&b=2&foo%5B%5D=111&foo%5B%5D=222#hello%20world"
// "parsed_url" => array:8 [
// "scheme" => "https"
// "host" => "www.example.com"
// "user" => null
// "password" => null
// "port" => null
// "path" => "/path/a/c/%D1%91/hello%20world"
// "query" => "a=1&b=2&foo%5B%5D=111&foo%5B%5D=222"
// "fragment" => "hello%20world"
// ]
// ]
Contributions are welcome! Please submit pull requests or open issues.
This library is licensed under the MIT License. See the LICENSE file for details.