A Machine-readable Server Identity and Purpose Descriptor, and mechanisms for delivering a low-entropy signal indicating user consent.
Mike O'Neill, February 2019
Contributors:
- Mike O'Neill <michael.oneill@baycloud.com>
Web pages often contain many, sometimes hundreds, of "third-party" components that initiate transactions with servers other than those managed by the top-level website. These "third-party" servers can access storage in the user's device or browser, collect personal data, and link it to data from other sources. The user is usually completely unaware of this.
Unfortunately, there is no recognised standard way for web servers to declare this information i.e. to deliver information that allow users to identify the entities, what their purpose(s) for data collection are (if any), who they share it with, how long they keep it etc.
There is increasing legal pressure around the world for websites to at least declare their use of data collection procedures, explain how they intend to use the data, or what their legal basis is. In many jurisdictions users must be given the opportunity to give or withdraw their agreement to storage access or personal data collection, and offered the right to have any previously collected data deleted.
In addition, user agents have implemented procedures that in some circumstances block particular third-party elements or restrict their ability to access cookies. Some of these sub-resources may be managed by the same entity managing the top-level site, or have previously been given explicit consent by the user. A machine-readable mechanism to record and communicate this could be useful.
The following is a possible JSON encoding that can deliver the required machine-readable information so that a user agent can make it accessible by the user in an standardised and easily digestible way, and to act on user specified preferences.
The information would be obtained by sending a secure HTTP GET to the resource /.well-known/privacy-declaration
relative to any origin.
For example the data declaration for the domain www.bigco.com
would be at https://www.bigco.com/.well-known/privacy-declaration/ and return a JSON document with the Content-Type
"application/privacy-declaration+json".
Alternatively the objects defined here could be incorporated in an Origin Policy manifest "Origin Policy" to minimise
the number of round-trips required when accessing a resource.
User agents or script could automatically parse the information as a JavaScript object at a standard location e.g. navigator.privacyDeclaration, which could then be used to display human-readable information to users. First-party sites could ensure this was always available by using an open source JavaScript library, and to support this the privacy-declaration resource should support CORS (Cross-origin resource sharing), so it can be accessed via the appropriate cross-origin fetch or XHR.
JavaScript can examine the JSON encoded for the first-party then use the otherParties and sameParties arrays to fetch the correct privacy-declaration JSON resources from them (made possible because the third-party resources are CORS enabled). The sameParties set of domains could identify sub-resources which can be trusted as "first-party" because they are managed by the entity that manages the top-level site. User agents can check that each origin in a set are referenced by the other origins by their own privacy-declaration resource, i.e. that they all contain exactly the same "sameParties" set. It may be possible for top-level or parent documents to host external privacy-declarations as bundles of "Signed HTTP Exchanges", which would avoid user agents having to make extra round-trips to get them. See @mikewest's proposal for this in "First-Party Sets".
Other methods are possible to ensure that domains are related, for example there could be a link to information in TLS certificate or domain name registrar's whois entry. There is also ongoing discussion about using DNS records to associate relatedness between domain names.
The privacy-declaration resource could be dynamically generated so that some properties could reflect different user agent states derived from the incoming HTTP Request. For example, the server would examine incoming cookies or other headers in order to calculate the correct value of the "consented" property, or the length of time before consent expires.
There should be some standardisation of a low-entropy client originated signal, which could be an existing request header in widespread use like DNT,
a new request header designed to be a better fit with European ePrivacy and data protection law,
or a specific cookie name such at the IAB EU's euconsent
cookie. Another avenue could maybe be explored by extending the cookie "prefix" options
described in "Cookies: HTTP State Management Mechanism draft-ietf-httpbis-rfc6265bis-02".
For example here is a way to encode a consent indication cookie:
Set-Cookie: __Consent-eu=1,5,6; SameSite=Strict; Expires=Sun, 06 Nov 2019 08:49:37 GMT
TThe cookie has the SameSite
attribute set to Strict
so it is restricted to the top-level site, i.e. it can only signal site-specific consent.
Using a prefix could allow for recognition and then "special treatment" for low-entropy "consent indication" cookies by user agents.
For example User Agents could restrict the scope of such cookies to the context of a top-level origin,
so all or specified embedded origins on a particular site could receive "site-specific" consent indications.
The site-specific delivery of consent cookies is impossible without explicit browser or browser extension support,
so another method should be standardised so servers can deliver the functionality themselves. The IABEU's TCF proposes a templating system
in order to deliver consent information within the request url i.e. as an appended query parameter. This has been forced on them by the increasing restrictions placed by some browsers,
e.g. Safari and Firefox, on the use of third-party cookies, but a low-entropy version of this approach would allow for site-specific consent within the web sites that support the functionality.
Property | Type | Description |
---|---|---|
name | String | Recognisable & unique entity name e.g. "Google Inc." |
policy | String(Uri) | Human readable HTML page explaining the entity’s privacy policy |
storagePolicy | String(Uri) | Human readable HTML page explaining the terminal storage policy |
about | String(Uri) | Human readable HTML page describing the entity |
deleteData | String(Uri) | A HTTP POST will cause all user agent data for this origin to be deleted, e.g. Clear-Site-Data header could be returned |
mayCollect | Boolean | "false" declares that no data is collected, "true" if it may be collected |
mayShare | Boolean | "false" declares no data will be shared with other entities |
mayCombine | Boolean | "false" declares that data is not combined or linked with data from other sources |
purposes | Array of PurposeType Objects | Lists all the purpose for which data is collected |
storage | Array of StorageType Objects | Lists the terminal storage items that may be utilised |
otherParties | Array of Strings | Lists the third-party domains of embedded resources that may appear on this page |
sameParties | Array of Strings | Lists the first-party domains of embedded resources, i.e. those managed by the same entity, that may appear on this page |
The user can give their agreement for zero or more purposes. The purposeType Object for a particular purpose includes a Boolean consented which can be dynamically derived from the incoming HTTP request headers (e.g. cookies).
The storage objects are linked to the specific purposes which they are designed to implement. This gives user agents fine grained ability to restrict storage use to the purposes a user has agreed to.
A browser, browser extension or script executing in the top-level browsing context can use the otherParties and sameParties array to fetch the Descriptors for those domain origins (by fetching the resource at https://{domain name}/.well-known/privacy-declaration.
Property | Type | Description |
---|---|---|
type | String | Storage Type, one of "cookie", "local" (localStorage), "indexed" (indexedDB), "cache" (ETag) |
name | String | Cookie name prefix, localStorage item name, or indexedDB table |
purposeList | Array of Integer | List of ordinal values of entries in the "purposes" array. e.g. [0,1] indicates the first and second purpose type is supported by this Storage Type |
Property | Type | Description |
---|---|---|
name | String | Short identifying label for this purpose |
description | String | A human readable text clearly describing this purpose in the appropriate language |
maxRetainedFor | Integer | Number of seconds data is retained after collection |
expiresIn | Integer | Number of seconds remaining before collected data is deleted |
consented | Boolean | Dynamic indication of registered user agreement for this purpose |
An example of the encoding.
{
"name": "BigCo Inc",
"policy": "https://www.bigco.com/privacy.html",
"storagePolicy": "https://www.bigco.com/cookie.html",
"about": "https://www.bigco.com/about",
"mayCollect": "true",
"mayShare": "true",
"mayCombine": "false",
"purposes": [
{
"name": "behavioural advertising",
"description": "compiling history of web sites visited",
"maxRetainedFor": "1000000",
"expiresIn": "45667",
"consented": "false"
},
{
"name": "website analytics",
"description": "web audience measurement",
"maxRetainedFor": "10000",
"expiresIn": "3456",
"consented": "false"
},
{
"name": "authentication",
"description": "logging in",
"maxRetainedFor": "1000000",
"expiresIn": "67854",
"consented": "false"
}
],
"storage": [
{
"type": "cookie",
"name": "_ga",
"purposeList": ["0","1"]
},
{
"type": "cookie",
"name" "user",
"purposeList": ["2"]
},
{
"type": "local",
"name" "dataname",
"purposeList": ["0"]
}
],
"otherParties": [
"[www.google.com]",
"[www.google-analytis.com]",
"adnxs.com"
],
"sameParties": [
"ourcdn.com"
]
}
-
Mike West has proposed a way for origins to assert they belong to a set managed by the same top-level or "first party" resource "First-Party Sets"
-
The Tracking Protection Working Group's "Tracking Preference Expression (DNT)" defined a server transparency declaration at
/.well-known/dnt/
This was designed to allow the entity managing any server (first-party or subresource) to declare various properties to aid transparency. -
John Wilander has proposed amendments to the Same Origin Policy so sets of domains could be trusted as if they were first-party. "Single Trust and Same-Origin Policy v2"
-
There is ongoing discussion within the IETF about recognising domain name "relatedness" in DNS records "Related Domains By DNS"
-
Cookie Name Prefixes are discussed in the under-development replacement for RFC 6265 "Cookies: HTTP State Management Mechanism draft-ietf-httpbis-rfc6265bis-02"
-
The IAB EU's "IAB Europe Transparency & Consent Framework" defines an externally hosted JSON resource that identifies Advertising technology vendors and a set of defined purposes.
-
There is ongoing work defining an origin wide server Origin Policy Manifest file at a well-known location. "Origin Policy".