Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearly disclose in the docs telemetry collection being enabled by default #1884

Closed
AlexTMjugador opened this issue Aug 16, 2024 · 5 comments

Comments

@AlexTMjugador
Copy link
Contributor

Recently, one of my CI workflows that uses cargo-binstall (and, by extension, quickinstall) began showing HTTP error warnings due to requests to https://warehouse-clerk-tmp.vercel.app returning a 402 status code. Curious about the cause of these requests, I investigated the topic a bit and came across issues like #1822. To my surprise, I couldn't find any clear documentation in cargo-binstall or cargo-quickinstall indicating that this silent telemetry collection was occurring in the first place.

From an ethical standpoint, I think it's only fair to disclose that cargo-binstall collects telemetry by default. As noted in the linked issue, these additional HTTP requests can raise concerns, and privacy-conscious users may have valid reasons to opt out of such data collection.

Legally, while I'm not qualified to provide legal advice, I recognize that, despite the fact that the line between what is considered personal information and what is not can be very thin, that distinction can have significant implications, particularly regarding whether regulations like the EU GDPR apply to this telemetry data collection. If, for instance, the GDPR applies because the collection endpoint logs IP addresses or other device data that could potentially identify an individual, then the telemetry would be subject to strict disclosure, consent, and data handling requirements, which may not currently be met.

On the other hand, and from a practical standpoint, this telemetry collection has had a history of being unreliable at times (c.f. cargo-bins/cargo-quickinstall#164). Therefore, even if slightly more users explicitly disable telemetry as a consequence of such a notice, I think it's unlikely for such decisions to have any statistically significant effect on reducing the usefulness of the collected data for analysis and decision-making.

In my view, Visual Studio Code could serve as a model for how to handle this, as it does a good job of disclosing how and why telemetry is collected on a dedicated documentation page. Alternatively, adding a dedicated section about telemetry to the project's README could also help bring such telemetry collection the attention it deserves. I believe that merely documenting telemetry collection through the --disable-telemetry CLI option is insufficient, as users interacting with cargo-binstall via e.g. taiki-e/install-action may never even discover any cargo-binstall CLI options.

@NobodyXu
Copy link
Member

Thank you!

I agree that we absolutely would want such documentation for cargo-binstall.

Both in the README, and probably in the --help.

@AlexTMjugador
Copy link
Contributor Author

AlexTMjugador commented Aug 16, 2024

It's great to hear that!

I wouldn't mind helping out by submitting a PR to describe this telemetry collection in the documentation, but I don't have a clear picture on how exactly it works and/or is meant to work, so I don't think it'd be really useful. Please feel free to go over more details either in this issue, in a draft, or in a final documentation modification, as it's more comfortable for you 😄

For what it's worth, a rough but hopefully helpful guideline of points I'd expect such documentation note to cover would be:

  • What data is collected (in my view, this should not only include obvious data sent by cargo-quickinstall on the HTTP request body, but also IP addresses or any other data that other services or parties may collect).
  • Why is it collected, for what purposes.
  • Where the telemetry is sent to.
  • Who will use the telemetry data.
  • For how long the data will be stored.
  • Whether the data may be transferred to other parties in the future.
  • How to opt-out.

@NobodyXu
Copy link
Member

I would welcome contributions/PR!

What data is collected (in my view, this should not only include obvious data sent by cargo-quickinstall on the HTTP request body, but also IP addresses or any other data that other services or parties may collect).

Based on my knowledge, only the crate to be installed, its version and the available targets on local is collected.

We would then use them to decide which version to build.
For each target cargo-quickinstall supports, we'd maintain a popular crates and select them for building.

Where the telemetry is sent to.

https://warehouse-clerk-tmp.vercel.app/api/crate, it's not very actively maintained and was just in a working state.

There was plan to rewrite it but we didn't have much time working on it.

cc @alsuren implements the current statistics collection so they probably know about this more than me.

Who will use the telemetry data.

The url for access it is public (https://warehouse-clerk-tmp.vercel.app/api/stats) but it seems to be down for now.

For how long the data will be stored.

I don't know about this, you'd have to ask @alsuren , but the effect of these data (which binary is built) will be available on Github Release.

If the data access is public, then I would essentially say it is permanently saved.

Whether the data may be transferred to other parties in the future.

We don't have plans to do that, but if it is publicly available then others could already access.

How to opt-out.

  • --disable-telemetry to disable telemetry
  • --disable-strategy quick-install to disable installing from quick-install and disable its telemetry
  • using --strategy but excluding quick-install has the same effect
  • crate maintainer can add disabled-strategies to their Cargo.toml to disable it (unless user overrides it via --strategy)

@AlexTMjugador
Copy link
Contributor Author

Awesome, thanks a bunch for the detailed answers! I've gone ahead and opened a PR to add the discussed disclaimers to the appropriate sections.

Depending on alsuren's answers to some points, we might or might not want to tweak some of the wording in the disclaimers. Either way, I believe some disclosure is better than none 👍

github-merge-queue bot pushed a commit that referenced this issue Aug 20, 2024
…DME (#1890)

* docs: mention `quickinstall` telemetry collection in `--help` and README

These changes describe the usage statistics collected when the
`quickinstall` strategy is used by default, according to the discussion
and details brought forward on
#1884. Both the
project README and the CLI long help contain clear disclosures of such
statistics collection now.

Signed-off-by: Alejandro González <me@alegon.dev>

* docs: add some more data collection details

Signed-off-by: Alejandro González <me@alegon.dev>

---------

Signed-off-by: Alejandro González <me@alegon.dev>
@AlexTMjugador
Copy link
Contributor Author

I'm closing this issue as I think the now merged PR is good enough to resolve it. Thanks to everyone involved! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants