- Welcome!
- Contributing Rulesets
- General Info
- New Rulesets
- Minimum Requirements for a Ruleset PR
- Testing
- Ruleset Style Guide
- Motivation
- Indentation & Misc Stylistic Conventions
- Wildcards in Targets
- Complicated Regex in Rules
- Enumerating Subdomains
- Target Ordering
- Rule Ordering
- Non-working hosts
- Ruleset Names
- Cross-referencing Rulesets
- Regex Conventions
- Snapping Redirects
- Example: Ruleset before style guidelines are applied
- Example: Ruleset after style guidelines are applied, with test URLs
- Removal of Rules
- Contributing Code
- Contributing Documentation
- Contributing Translations
Welcome, and thank you for your interest in contributing to HTTPS Everywhere! HTTPS Everywhere depends on the open source community for its continued success, so any contribution is appreciated.
One of the things that makes it easy to contribute to HTTPS Everywhere is that you don't have to be a coder to contribute. That's because HTTPS Everywhere's most important component is the list of rules that tell it when it can request a website over HTTPS. These rules are just XML files that contain regular expressions, so if you can write XML and simple regexes, you can help us add rules and increase HTTPS Everywhere's coverage. No coding skills necessary!
If you want to have the greatest impact, however, you can help be a ruleset maintainer. Ruleset maintainers are trusted volunteers who examine rulesets contributed by others and work with them to ensure that these rulesets work properly and are styled correctly before they're merged in. While we currently have a couple of extremely dedicated and extremely proficient ruleset maintainers, the backlog of sites to add to HTTPS Everywhere just keeps growing, and they need help! If you would like to volunteer to become one, the best thing to do is to build trust in your work by monitoring the repository, contributing pull requests, and commenting on issues that interest you. Then you can contact us at https-everywhere-rules-owner [at] eff <dot> org expressing your interest in helping out.
If you get stuck we have two publicly-archived mailing lists: the https-everywhere list (https://lists.eff.org/mailman/listinfo/https-everywhere) is for discussing the project as a whole, and the https-everywhere-rulesets list (https://lists.eff.org/mailman/listinfo/https-everywhere-rules) is for discussing the rulesets
and their contents, including patches and git pull requests.
You can also find more information on about HTTPS Everywhere on our FAQ page.
Thanks again, and we look forward to your contributions!
There are several main areas of development on HTTPS Everywhere: the rulesets, the core codebase, utilities, and tests.
The rulesets can be found in the rules
top-level path and include all the rules for redirecting individual sites to HTTPS. These are written in XML. If you want to get started contributing to HTTPS Everywhere, we recommend starting here.
The core codebase consists of the code that performs the redirects, the UI, logging code, and ruleset loading. This encompasses all code delivered with the extension itself that is not a ruleset. It is written in JavaScript, using the WebExtensions
API (located in chromium
).
The utilities (utils
top-level path) include scripts that build the extension, sanitize and perform normalization on rulesets, simplify rules, and help label GitHub issues. Historically, these utilities have been written in Python. Many of the newer utilities are written in JavaScript, and are meant to be run in node. Some of the wrappers for these utilities are in shell scripts.
Tests are performed in headless browsers and located in the test
top-level path. These are written in Python, and some of the wrappers for these tests are in shell scripts.
To submit changes, open a pull request from our GitHub repository.
HTTPS Everywhere is maintained by a limited set of staff and volunteers. Please be mindful that we may take a while before we're able to review your contributions.
Thanks for your interest in contributing to the HTTPS Everywhere rulesets
! There's just a few things you should know before jumping in. First some terminology, which will help you understand how exactly rulesets
are structured and what each one contains:
ruleset
: a scope in whichrules
,targets
, andtests
are contained.rulesets
are usually named after the entity which controls the group oftargets
contained in it. There is oneruleset
per XML file within thesrc/chrome/content/rules
directory.target
: a Fully Qualified Domain Name which may include a wildcard specified by*.
on the left side, whichrules
are applied to. There may be manytargets
within any givenruleset
.rule
: a specific regular expression rewrite that is applied for all matchingtargets
within the sameruleset
. There may be manyrules
within any givenruleset
.test
: a URL for which a request is made to ensure that the rewrite is working properly. There may be manytests
within any givenruleset
.
<!--
An example ruleset. Note that this example doesn't necessarily
satisfy the style criteria described below - we just have it
here to show you what the components of a ruleset looks like.
-->
<ruleset name="eff.org">
<target host="*.eff.org" />
<rule from="^http:" to="https:" />
<test url="http://www.eff.org/https-everywhere/" />
</ruleset>
HTTPS Everywhere includes tens of thousands of rulesets
. Any one of these sites can change their HTTPS configuration at any time, so keeping HTTPS Everywhere usable is a task that requires constant maintenance. At the same time, HTTPS deployment on the web is becoming more and more widespread, thanks to projects like Let's Encrypt. This is a very good thing, as it means the web is becoming a safer place! However, with each new ruleset
that HTTPS Everywhere includes comes with an increase in both download size upon install and memory usage at runtime. Rather than adding new rulesets
, we encourage potential contributors to look for broken rulesets
and try to fix them first.
Some rulesets
have the attribute platform="mixedcontent"
. These rulesets
cause problems in browsers that enable active mixed-content (loading insecure resources in a secure page) blocking. When browsers started enforcing active mixed-content blocking, some HTTPS sites started to break. That's why we introduced this tag - it disables those rulesets
for browsers blocking active mixed content. It is likely that many of these sites have fixed this historical problem, so we particularly encourage ruleset
contributors to fix these rulesets
first:
git grep -i mixedcontent src/chrome/content/rules
If you want to create new rulesets
to submit to us, we expect them to be in the src/chrome/content/rules
directory. That directory also contains a useful script, make-trivial-rule
, to create a simple ruleset
for a specified domain. There is also a script in test/validations/special/run.py
, to check all the pending rulesets
for several common errors and oversights. For example, if you wanted to make a ruleset
for the example.com
domain, you could run:
cd src/chrome/content/rules
bash ./make-trivial-rule example.com
This would create Example.com.xml
, which you could then take a look at and edit based on your knowledge of any specific URLs at example.com
that do or don't work in HTTPS. Please have a look at our Ruleset Style Guide below, where you can find useful tips about finding more subdomains. Our goal is to have as many subdomains covered as we can find.
There are several volunteers to HTTPS Everywhere who have graciously dedicated their time to look at the ruleset
contributions and work with contributors to ensure quality of the pull requests before merging. It is typical for there to be several back-and-forth communications with these ruleset
maintainers before a PR is in a good shape to merge. Please be patient and respectful, the maintainers are donating their time for no benefit other than the satisfaction of making the web more secure. They are under no obligation to merge your request, and may reject it if it is impossible to ensure quality. You can identify these volunteers by looking for the "Collaborator" identifier in their comments on HTTPS Everywhere issues and pull requests.
In the back-and-forth process of getting the ruleset
in good shape, there may be many commits made. It is this project's convention to squash-and-merge these commits into a single commit before merging into the project. If your commits are cryptographically signed, we may ask you to squash the commits yourself in order to preserve this signature. Otherwise, we may squash them ourselves before merging.
We prefer small, granular changes to the rulesets. Not only are these easier to test and review, this results in cleaner commits.
A general workflow for testing sites that provide both HTTP and HTTPS follows. Open a version of the browser of your choice without HTTPS Everywhere loaded to the HTTP endpoint, alongside the browser with the latest code and rulesets for HTTPS Everywhere loaded to the HTTPS endpoint (as described in README.md.) Click around and compare the look and functionality of both sites.
If something fails to load or looks strange, you may be able to debug the problem by opening the network tab of your browser debugging tool. Modify the ruleset
until you get it in a good state - you'll have to re-run the HTTPS Everywhere-equipped browser upon each change.
Rules should be written in a way that is consistent, easy for humans to read and debug, reduces the chance of errors, and makes testing easy.
To that end here are some style guidelines for writing or modifying rulesets. They are intended to help and simplify in places where choices are ambiguous, but like all guidelines they can be broken if the circumstances require it.
Use tabs for indentation. For tests
and exclusions
, place them under the target
that they refer to, indented one additional layer. See below for an example.
We provide an .editorconfig
file in the top-level path, which you can configure your editor of choice to use. This will enforce proper indentation.
Use double quotes ("
, not '
).
Avoid using the left-wildcard (<target host="*.example.com" />
) unless you intend to rewrite all or nearly all subdomains. If it can be demonstrated that there is comprehensive HTTPS coverage for subdomains, left-wildcards may be appropriate. Many rules today specify a left-wildcard target, but the rewrite rules only rewrite an explicit list of hostnames.
Instead, prefer listing explicit target hosts and a single rewrite from "^http:"
to "^https:"
. This saves you time as a ruleset author because each explicit target host automatically creates an implicit test URL, reducing the need to add your own test URLs. These also make it easier for someone reading the ruleset to figure out which subdomains are covered.
If you know all subdomains of a given domain support HTTPS, go ahead and use a left-wildcard, along with a plain rewrite from "^http:"
to "^https:"
. Make sure to add a bunch of test URLs for the more important subdomains.
Right-wildcards (<target host="account.google.*" />
) are highly discouraged. Only use them in edge-cases where other solutions are unruly.
Example:
- Complicated rulesets like
Google.tld_Subdomains.xml
Where they must be used, please add a comment to the ruleset
explaining why.
Avoid regexes with long strings of subdomains, e.g. <rule from="^http://(foo|bar|baz|bananas).example.com" />
. These are hard to read and maintain, and are usually better expressed with a longer list of target hosts, plus a plain rewrite from "^http:"
to "^https:"
.
In general, avoid using open-ended regex in rules. In certain cases, open-ended regex may be the most elegant solution. But carefully consider if there are other options.
Examples:
- Rulesets with a lot of domains that we can catch with a simple regex that would be tedious and error-prone to list individually, like
360.cn.xml
- CDNs with an arbitrarily large number of subdomains, like EFForg#7484 (comment) .
If you're not sure what subdomains might exist, you can install the Sublist3r
tool:
git clone https://github.com/aboul3la/Sublist3r.git
cd Sublist3r
sudo pip install -r requirements.txt # or use virtualenv...
Then you can to enumerate the list of subdomains:
python sublist3r.py -d example.com -e Baidu,Yahoo,Google,Bing,Ask,Netcraft,Virustotal,SSL
Alternatively, you can iteratively use Google queries and enumerate the list of results like such:
- site:*.eff.org
- site:*.eff.org -site:www.eff.org
- site:*.eff.org -site:www.eff.org -site:ssd.eff.org
... and so on.
In all cases where there is a list of domains, sort them in alphabetical order starting from the top level domain at the right reading left, moving ^ and www to the top of their group. For example:
example.com
www.example.com
a.example.com
www.a.example.com
b.a.example.com
b.example.com
example.net
www.example.net
a.example.net
If there are a handful of tricky subdomains, but most subdomains can handle the plain rewrite from "^http:"
to "^https:"
, specify the rules for the tricky subdomains first, and then then plain rule last. Earlier rules will take precedence, and processing stops at the first matching rule. There may be a tiny performance hit for processing exception cases earlier in the ruleset and the common case last, but in most cases the performance issue is trumped by readability.
It is useful to list hosts that do not work in the comments of a ruleset
. This is a stylistic preference but is not strictly required.
For easy reading, please avoid using UTF characters unless in the rare instances that they are part of the hostname itself.
Example:
<!--
Invalid certificate:
8marta.glavbukh.ru
forum2.glavbukh.ru (incomplete certificate chain)
Redirect to HTTP:
8marta2013.glavbukh.ru
den.glavbukh.ru
Refused:
e.glavbukh.ru
www.e.glavbukh.ru
Time out:
psd.glavbukh.ru
str.glavbukh.ru
-->
In most cases, the absence of a 2XX
or 3XX
endpoint indicates that a host should not be included in the set of targets
and is non-working, except when it is clear that the site functions as intended in the absence of such an endpoint.
For simple sites, the ruleset
name
attribute can be either a site description or the domain itself. For example, the SeattleAquarium.org.xml ruleset could have a name
of Seattle Aquarium
, SeattleAquarium.org
, or seattleaquarium.org
.
If a ruleset
covers multiple domains, then the ruleset
name
should reflect the broader organization, project, or concept for what a ruleset is trying to accomplish.
Examples:
Google.xml
is just namedGoogle
, andBitly.xml
is namedbit.ly
, butBitly_branded_short_domains.xml
is namedBitly vanity domains
Filenames should vaguely resemble the name
so that someone looking for the file based on the name
can find it easily. Filenames that start with a capital letter are preferred. Prefer dashes over underscores in filenames. Dashes are easier to type.
This sort of comment: For other Migros coverage, see Migros.xml.
is definitely appropriate, in both directions.
When matching an arbitrary DNS label (a single component of a hostname), prefer ([\w-]+)
for a single label (i.e. www), or ([\w.-]+)
for multiple labels (i.e. www.beta). Avoid more visually complicated options like ([^/:@\.]+\.)?
.
For securecookie
tags, if you know that all cookies on the included targets can be secured (which in particular means that the cookies are not used by any of its non-securable subdomains), use the trivial
<securecookie host=".+" name=".+" />
where we prefer .+
over .*
and .
. They are functionally equivalent, but it's nice to be consistent.
Avoid the negative lookahead operator ?!
. This is almost always better expressed using positive rule tags and negative exclusion tags. Some rulesets have exclusion tags that contain negative lookahead operators, which is very confusing.
Prefer capturing groups (www\.)?
over non-capturing (?:www\.)?
. The non-capturing form adds extra line noise that makes rules harder to read. Generally you can achieve the same effect by choosing a correspondingly higher index for your replacement group to account for the groups you don't care about.
Avoid snapping redirects. For instance, if https://foo.fm serves HTTPS correctly, but redirects to https://foo.com, it's tempting to rewrite foo.fm to foo.com, to save users the latency of the redirect. However, such rulesets are less obviously correct and require more scrutiny. And the redirect can go out of date and cause problems. HTTPS Everywhere rulesets should change requests the minimum amount necessary to ensure a secure connection.
<ruleset name="WHATWG.org">
<target host='whatwg.org' />
<target host="*.whatwg.org" />
<rule from="^http://((?:developers|html-differences|images|resources|\w+\.spec|wiki|www)\.)?whatwg\.org/"
to="https://$1whatwg.org/" />
</ruleset>
<ruleset name="WHATWG.org">
<target host="whatwg.org" />
<target host="www.whatwg.org" />
<target host="developers.whatwg.org" />
<target host="html-differences.whatwg.org" />
<target host="images.whatwg.org" />
<target host="resources.whatwg.org" />
<target host="*.spec.whatwg.org" />
<test url="http://html.spec.whatwg.org/" />
<test url="http://fetch.spec.whatwg.org/" />
<test url="http://xhr.spec.whatwg.org/" />
<test url="http://dom.spec.whatwg.org/" />
<target host="wiki.whatwg.org" />
<rule from="^http:"
to="https:" />
</ruleset>
It should be considered a sufficient condition for removal if a contributor can demonstrate that the TLS configuration for either a specific target
or a ruleset altogether is unstable and/or breaking, or will be unstable and/or breaking in the near future. It is, of course, preferable that the ruleset
be fixed rather than removed.
In utils
we have a tool called hsts-prune
which removes targets
from rulesets if they are already contained in the HSTS preload list for browsers that we support. To be explicit, the script is an implementation of the following policy:
Let
included domain
denote either atarget
, or a parent of atarget
. Letsupported browsers
include the ESR, Dev, and Stable releases of Firefox, and the Stable release of Chromium. Ifincluded domain
is a parent of thetarget
, theincluded domain
must be present in the HSTS preload list for allsupported browsers
with the relevant flag which denotes inclusion of subdomains set totrue
. Ifincluded domain
is thetarget
itself, it must be included the HSTS preload list for allsupported browsers
. Additionally, if the http endpoint of thetarget
exists, it must issue a 3XX redirect to the https endpoint for that target. Additionally, the https endpoint for thetarget
must deliver aStrict-Transport-Security
header with the following directives present:
max-age
>= 10886400includeSubDomains
preload
If all the above conditions are met, a contributor may remove the
target
from the HTTPS Everywhere rulesets. If all targets are removed for a ruleset, the contributor is advised to remove the ruleset file itself. The rulesetrule
andtest
tags may need to be modified in order to pass the ruleset coverage test.
Every new pull request automatically has the hsts-prune
utility applied to it as part of the continual integration process. If a new PR introduces a target
which is preloaded, it will fail the CI test suite. See:
.travis.yml
test/run_travis.sh
In addition to ruleset
contributions, we also encourage code contributions to HTTPS Everywhere. There are a few considerations to keep in mind when contributing code.
Officially supported browsers:
- Firefox Stable
- Firefox ESR
- Chromium Stable
We also informally support the Opera browser, but do not have tooling around testing Opera. Firefox ESR is supported because this is what the Tor Browser, which includes HTTPS Everywhere, is built upon. For the test commands, refer to README.md.
The current extension maintainer is @Hainish. You can tag him for PRs which involve the core codebase.
Several of our utilities and our full test suite is written in Python. Eventually we would like the whole codebase to be standardized as JavaScript. If you are so inclined, it would be helpful to rewrite the tooling and tests into JavaScript while maintaining the functionality.
Standalone documentation should be written in Markdown that follows the Google style guide. If you are updating existing documentation that does not follow the Google style guide, then you should follow the style of the file you are updating.
Sometimes a contributor will delete their GitHub account after submitting a pull request, resulting in the pull request being associated with the Ghost user (@ghost). These @ghost pull requests can cause problems for HTTPS Everywhere maintainers, leaving questions unanswered and closing off the possibility of receiving maintainer feedback to solicit clarification or request changes.
We ask that if you want to delete your GitHub account, you either close your HTTPS Everywhere pull requests before you delete your account, or wait to delete your account until we merge your pull requests. Otherwise, maintainers are free to close @ghost pull requests without any comment.
HTTPS Everywhere translations are handled through Transifex. The easiest way to help with translations is to create a Transifex account if you don't already have one. Then log into your account and click "Explore", then search for "Tor Project", and click on The Tor Project. Then choose the language you plan to translate into, click on the name of that language, and then click "Join team" and "Go" to accept joining the translation team for your language.
Then, in the Tor Project resources list, find and click the link for the file
HTTPS Everywhere - https-everywhere.dtd
and choose "Translate now" to enter the translation interface.