Skip to content

Creating Rules

Kain edited this page Mar 2, 2024 · 6 revisions

This page details the functionality of rules with examples to help you understand how they work.
Each rule should contain at least a unique name and a match regex. All other options are not required for the rule to be processed.
One rule can have multiple handlers, an example rule is as follows:

Explanation of how the versioning works found here: https://github.com/DrKain/tidy-url/wiki/Versions

{
        name: 'google.com',
        match: /www.google\..*/i,
        rules: [
            'sourceid', 'client', 'aqs', 'sxsrf', 'uact', 'ved', 'iflsig', 'source',
            'ei', 'oq', 'gs_lcp', 'sclient', 'bih', 'biw', 'sa', 'dpr', 'rlz',
            'gs_lp', 'sca_esv', 'si', 'gs_l'
        ],
        amp: {
            regex: /www\.google\.(?:.*)\/amp\/s\/(.*)/gim,
        },
        redirect: 'url'
    }

Important!
You do NOT need to have every single one of these set. If you're not using it then don't add it to the rule. The only required values are name and match.

name

Type: string
The name of the website. This should be unique and not include http/s. Unless essential the path name should not be included either. If the rule modifies a subdomain then the subdomain should be included in the name.

match

Type: RegExp
Regex used to match the rule. This should be as strict as possible to avoid accidental matches.
By default this will only test against the URL host.

match_href

Type: boolean
If true, the match regex will test against the full URL instead of being limited to the host. This should be avoided if possible to avoid conflict with other domains. The example below shows how to match a certain path on the website without breaking any other links that might be valid.

const rule = { match_href: true, redirect: 'q', match: /website.com\/tracker/gim };

const example1 = 'website.com/tracker?q=google.com';
// Result: google.com
const example2 = 'website.com/search?q=google.com';
// Result: website.com/search?q=google.com

rules

Type: Array
An array of parameters. Any parameter (case-insensitive) found in this array will be removed from the URL when cleaning regardless of the value. A rule should not be added if it modifies the page based on the value. It should not be added if the website automatically re-adds the parameter as this can cause a loop. A bad parameter is a parameter that is used for tracking, analytics and/or has no effect on the page regardless of the value.
Any new rules added must be tested as much as possible! If a website breaks as a result of the parameter please open an issue on GitHub so it can be fixed ASAP.

Example: website.com?foo=bar&fizz=buzz
Rule:    ['foo']
Result:  website.com?fizz=buzz

replace

Type: Array
An array of strings or RegExp. Used in special cases where parts of the URL need to be modified, this should be avoided whenever possible because the risk of breaking valid URLs is high.

Example:  website.com/some-link
Rule:     ['some']
Result:   website.com/-link

redirect

Type: string
Used to auto-redirect to a different URL based on the parameter. For the most part this is used to skip websites that track external links or display a warning to let users know they are leaving the website.
If the result is not a valid URL then the rule will fail.

const input = 'tracker.com/outbound-link?url=google.com';
const rule = { redirect: 'url' };
// Result: google.com

amp

Type: object

There are two methods available for AMP links.

The first is simple regex, the result of the first capturing group will be considered a URL (and verified). When matched https:// will be added to the start of the link and amp/ will be removed from the end.

const input = "https://www.google.com/amp/s/website.com%2Fa-normal-page%3Famp/";
const rule = { amp: { regex: /www.google.com\/amp\/s\/(.*)/gim } };
// Result: https://website.com/a-normal-page/

The second is replacing plain text or a regex match with nothing or an empty string. An example of this is as follows:

const input = 'https://amp.website.com/a-normal-page/';
const rule = {
    replace: {
        text: 'amp.website.com',
        with: 'website.com'
    }
};
// Result: https://website.com/a-normal-page/

You can enable or disable the handling of amp links by setting config.allowAMP to true.
See the AMP Links wiki page for more information.

decode

Type: object
Used to decode a base64 parameter then redirect based on the result.
This rule is an object that expects param and lookFor to properly function. If the decoded result is not a JSON object then lookFor will be ignored. The resulting URL must be valid for this rule to pass.

Example 1: This rule does not include lookFor and the decoded string is a URL.

const example = 'tracker.com?target=aHR0cHM6Ly9naXRodWIuY29tL0RyS2Fpbi90aWR5LXVybA==';
const rule = { decode: { param: 'target' } };
// Result: https://github.com/DrKain/tidy-url/

Example 2:
This rule includes lookFor and the decoded string is a JSON object containing the desired URL.

const example = 'tracker.com?target=eyJ1cmwiOiJodHRwczovL2dpdGh1Yi5jb20vRHJLYWluL3RpZHktdXJsLyIsInRyYWNraW5nX2lkIjoxMjM0fQ==';
const rule = { decode: { param: 'target', lookFor: 'url' } };
// Decoded: {"url":"https://github.com/DrKain/tidy-url/","tracking_id":1234}
// Result: https://github.com/DrKain/tidy-url/

Example 3: Custom Handlers
This rule uses custom handlers. You can read more about them here

{
    name: 'click.redditmail.com',
    match: /click.redditmail.com/i,
    decode: { handler: 'click.redditmail.com' }
}

rev

Type: boolean
Remove empty values (for parameters). The URL searchParams toString() function will add an equal sign for empty values and this can break functionality on websites in some rare cases. See issue #90 for one of them.

rev=false : website.com/page?foo=bar&fizz=
rev=true  : website.com/page?foo=bar&fizz
Clone this wiki locally