Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the ancillary data section. #359

Closed
wants to merge 3 commits into from
Closed

Conversation

jyasskin
Copy link
Collaborator

@jyasskin jyasskin commented Sep 26, 2023

Hopefully this satisfies both the WebPerf and Privacy goals...

Sorry to Amy for probably undoing their readability fixes. 🫣

Fixes #220.


Preview | Diff

@npdoty
Copy link
Collaborator

npdoty commented Sep 27, 2023

#220 is an issue about revising the examples in this section, and #221 is a PR that would address that issue.

Is this re-write addressing some other issue? Or a series of issues, given how completely it changes the existing text?

@jyasskin
Copy link
Collaborator Author

Unfortunately, #221 created a conflict with the WebPerf WG. The old text appeared to say that if an API was primarily good for ancillary uses, then it was producing ancillary data, which meant most of the WebPerf APIs produced ancillary data. The example didn't help, since it didn't mention any concrete data. In the cases where WebPerf were just summarizing data produced by other web APIs, they didn't think it was reasonable to "aggressively minimize" the data without being able to use research or heuristics to decide some of it was ok.

@yoavweiss pointed out that it works better to focus on the source of the data, and less on how it's processed or summarized. He pointed out particular reporting APIs that expose new ancillary data, and others that merely summarize non-ancillary data that's available from other sources. That let me improve the example to name concrete telemetry APIs that do or don't return ancillary data. It also let me lay out a decision tree for API designers and reviewers to follow in order to figure out how to treat their new API. Listing the concrete options for API designers did take more space than generally describing what users might want, but I think it's more useful too.

@npdoty
Copy link
Collaborator

npdoty commented Sep 28, 2023

Could we open an issue that describes the concerns that WebPerf would have, with either the existing text or with the text in #221?

My understanding is that performance analysis was widely agreed to be an ancillary use of data. And that https://www.w3.org/TR/privacy-principles/#information was specifically written, directly after conversations with WebPerf folks, to address the distinctions between data available from other/existing APIs.

@jyasskin
Copy link
Collaborator Author

I've asked @yoavweiss to organize an issue or a comment on the existing issues from WebPerf's perspective, and he thinks he'll get to that tomorrow.

Performance analysis is an ancillary use, but that doesn't make all of its inputs into ancillary data.

The Ancillary uses section seems to be trying to add restrictions on top of https://www.w3.org/TR/privacy-principles/#information. If it weren't adding restrictions, we could delete the whole section. But I think a few additional restrictions are useful, and that the principles I've added in this PR will lead to concrete privacy improvements in some of WebPerf's APIs.

Your original text had to be somewhat vague about this because we hadn't yet figured out how the WebPerf folks were thinking about their designs. Now that we have the distinction that some of their APIs are streamlining existing uses of APIs, while others are providing new capabilities, we can make the text more concrete. We should be willing to make more than local tweaks in that direction.

@yoavweiss
Copy link
Collaborator

Thanks @jyasskin!! (and apologies for the delay)

From the WebPerfWG's perspective, the previous principles were vague in ways that made their application open to interpretation, which I felt could lead to future disagreements.

IMO, the issue stems from the definition of ancillary data based on its "primary" use, regardless of the potential reasons to why the data is non-ancillarily exposed. Jeffrey's modifications on that front are very much welcome, as they create a clear deliniating line between data that is already exposed for functional reasons, and data that is only exposed for ancillary uses.

That kind of principle can enable us at the WebPerfWG to clearly mark the APIs we're working on and their different attributes as "ancillary" vs. "non-ancilllary", and think of ways in which we can work on migrating the ancillary data to safer options (e.g. aggregated exposure) without harming the use cases it tackles.

@jyasskin
Copy link
Collaborator Author

jyasskin commented Oct 4, 2023

Major discussion in https://github.com/w3ctag/privacy-principles/blob/main/meetings/2023-10-04-minutes.md, which I'll try to apply to a new PR.

@jyasskin jyasskin closed this Oct 4, 2023
@pes10k
Copy link
Collaborator

pes10k commented Oct 4, 2023

@jyasskin adding my three comments here, though happy to move them to the follow up PR too if they're applicable

  • one principal calls out "without the identifiable person's permission". I wasn't sure what work "identifiable" is doing here, and if its meant to distinguish different categories of "personal data"
  • CSP reports are listed as a type of report that summarizes existing data, but this isn't 100% correct. CSP reports include some information not currently easily accessible by the page (e.g., the violating script's text)
  • I wasn't sure which principal the "this principal" text on line 1178 is referring to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove or revise examples of signals browsers should use to prevent "privacy labor" in ancillary data
4 participants