Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Argon2 for encrypted vaults #3502

Merged
merged 14 commits into from
Jul 4, 2023
Merged

Conversation

jagodarybacka
Copy link
Contributor

@jagodarybacka jagodarybacka commented Jun 26, 2023

Resolves #3470

What

Let's use Argon2 instead of PBKDF2 🔑

What was already done:

  • added required packages
  • configured webpack config to work with webAssembly following example from the docs
  • add necessary config to content_security_policy to allow webAssembly - without 'wasm-eval' we are not able to use argon implementation in the extension
  • migration from old vaults to new vaults
  • handling errors during migration and allowing users to continue with old implementation if needed
  • added one time analytics event after successful migration
  • updated Jest to allow unit tests to work with WebAssembly, updated Typescript as this was necessary to make new version of Jest work correctly and fixed a bunch of Typescript issues - most fixes are copied from Versions that go bump in the night: Bump versions across a few different core dependencies #3415

Testing

  • test migration from main - install extension on main, add some HD wallets, checkout this branch, reload and unlock the wallet, make sure you don't see the error about failed migration in the background console, check if analytics event has been emitted, lock and unlock more than one time
  • install extension on this branch, test locking and unlocking the wallet
  • try to make webassembly part break (throw Error here) and make sure user is able to continue with old vaults

Latest build: extension-builds-3502 (as of Sun, 02 Jul 2023 21:28:25 GMT).

- add required packages
- configure webpack to work with webAssembly
- add neccessary config to `content_security_policy` to
allow webAssembly
@jagodarybacka jagodarybacka self-assigned this Jun 26, 2023
Let's set length to 32 bytes to match length expected by AES-GCM
Catch erros if vaults migration to Argon2 fails and allow to continue
with old vaults encrypted with PBKDF2.
Log analytics event when vaults are succesfully migrated.
@jagodarybacka
Copy link
Contributor Author

For fixing tests we need to update Jest to the newest version as they apparently added support for webassembly some time ago. There is a bunch of errors after I've upgraded jest version unfortunately 🙈

@Shadowfiend
Copy link
Contributor

I went through a lot of the jest upgrade pain in #3415 . 8ea287b is a (not so clean) commit with those changes. There's Ledger stuff in there as well, but might be straightforward to pick it apart.

@@ -265,21 +273,29 @@ export default class InternalSignerService extends BaseService<Events> {
return true
}

const { vaults, version } = await migrateVaultsToArgon(password)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nitpicking here but I think we could improve our approach if we kept the concern for migrating vaults separate from retrieving them. So, rather than returning vaults here, I propose we just return a boolean indicating success or failure and continue using getEncryptedVaults.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be true internally in the encryption layer, but for the caller it's the opposite--encryption layer should be completely in charge of making sure that the data is in the latest format, just like the db objects migrate before returning data.

Vault version is basically a constant across a given run of the extension. All retrievals of encrypted data should automatically reencrypt to the latest version. All encryptions should use the latest version. That fact should be transparent to almost everyone. Migrations should not be even a little bit optional (as skipping a function call might make them). This is how we ensure calling code doesn't make weird decisions about not wanting to upgrade, or weird mistakes about sometimes not doing so, etc: we give no one a choice or a hook.

To put this in terms of separation of concerns: the encryption layer is concerned with encrypting and decrypting data, and ensuring to the extent possible that all encrypted data is in the latest format.

Copy link
Contributor

@Shadowfiend Shadowfiend Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this some more. The gap between this and how the internal signer was already implemented is not huge but it's gnarly. I don't think we should deal with it right now, unfortunately.

The way this would manifest is that we would have getEncryptedVaults transparently handle vault migration, and writeLatestEncryptedVault would be redone as a writeLatestVault and would transparently handle encryption and fallbacks.

Again, I think we can push this to later.

Copy link
Contributor

@Shadowfiend Shadowfiend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Pondering how we can push more of the migration out of the service core and into the encryption layer... Right now the two are very entwined (really an old decision, nothing to do with the Argon2 stuff other than that it adds complexity) and it's leading to a gnarly unlock method that's starting to coordinate too much.

Not sure if it's worth refactoring further right now or not, will try to give more thoughts later today.

@@ -121,6 +126,7 @@ interface Events extends ServiceLifecycleEvents {
// TODO message was signed
signedTx: SignedTransaction
signedData: string
migratedToArgon2: never
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argon2 is an internal storage detail--it should not escape the service IMO. If we ever want to do a migration that isn't transparent to the user (eg because we want to encourage users to upgrade), we can revisit, but right now we should treat it as an implementation detail.

@@ -265,21 +273,29 @@ export default class InternalSignerService extends BaseService<Events> {
return true
}

const { vaults, version } = await migrateVaultsToArgon(password)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be true internally in the encryption layer, but for the caller it's the opposite--encryption layer should be completely in charge of making sure that the data is in the latest format, just like the db objects migrate before returning data.

Vault version is basically a constant across a given run of the extension. All retrievals of encrypted data should automatically reencrypt to the latest version. All encryptions should use the latest version. That fact should be transparent to almost everyone. Migrations should not be even a little bit optional (as skipping a function call might make them). This is how we ensure calling code doesn't make weird decisions about not wanting to upgrade, or weird mistakes about sometimes not doing so, etc: we give no one a choice or a hook.

To put this in terms of separation of concerns: the encryption layer is concerned with encrypting and decrypting data, and ensuring to the extent possible that all encrypted data is in the latest format.

@@ -1151,6 +1151,12 @@ export default class Main extends BaseService<never> {
}
})

this.internalSignerService.emitter.on("migratedToArgon2", async () => {
this.analyticsService.sendOneTimeAnalyticsEvent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the few cases where we should just call this directly on the service IMO. Piping analytics through events makes sense when we're observing an existing action, but less so when we're wanting to track an action that is internal to a service.

@@ -149,6 +155,8 @@ const isKeyring = (
export default class InternalSignerService extends BaseService<Events> {
#cachedKey: SaltedKey | null = null

#cachedVaultVersion: VaultVersion = VaultVersion.PBKDF2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never change, right? All vaults should be argon2 by the time the internal signer service sees them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well initially for our existing users they are VaultVersion.PBKDF2, once they are migrated successfully to Argon then yeah, this should never change again but if the migration fails then I want to be able to use old vaults so these users are not stuck with locked keys - that's why I wanted to keep that info here

Copy link
Contributor

@Shadowfiend Shadowfiend Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the thinking here. Here's how I think we handle that possibility:

  • Every time we unlock, we should try to migrate. The migration should check that once an Argon2 vault is encrypted, it can also be decrypted and produce the same data as the PBKDF2 vault had, before writing the migration to localStorage.
  • If the migration fails, we should log an analytics event. The unlocking process should continue normally (which will use the vault's current version, PBKDF2 if there was an error with the Argon2 encryption).
  • The migration process should tell us if it succeeded, i.e. we should be able to know “the internal signer data is all on the latest version of encryption”—this is just a boolean.
  • When we save the vault, we should use the strongest encryption we have (i.e., Argon2). If that fails, we should log an analytics event. At this point, if all vaults are on the latest version of encryption, we should fail the same way we would have failed if PBKDF2 blew up. If all vaults are not on the latest version, we should fall back to PBKDF2. In practice, this means we never want to end up with a newer vault on an older version of encryption than we have used until now—we should treat that as a full system failure.
  • After a set amount of time (maybe 6 months?) without errors in analytics, we should consider dropping the PBKDF2 fallbacks for encryption (but never for the migration for decryption).

This is more complex, but it never winds back security, and it doesn't leak encryption implementation details (vault version) outside of the encryption code.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let me try to implement it and see how it will go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented:

  • on every unlock we try to migrate
  • test if the vault encrypted with Argon produces the same data as before the migration
  • log analytics event on migration fail
  • migration process should return a boolean if it was successful

What is not clear to me:

When we save the vault, we should use the strongest encryption we have (i.e., Argon2)....

So this point - if I understand correctly what you want to achieve here - is something I would rather avoid putting into code, because it should never be the case where we are writing new vault encrypted with Argon while old vaults are on the PBKDF2.
That is because to save new vault we have to have the service unlocked and we are trying to migrate during unlocking. If the migration fails then most likely writing a new vault with Argon will fail as well. I would probably go with saving the current encryption algorithm in the service as this is implemented right now and then we can use it to add new vaults without adding more complexity 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of notes on the reasoning, but ultimately I think we're good here ->

it should never be the case where we are writing new vault encrypted with Argon while old vaults are on the PBKDF2.

I agree this isn't ideal, but it is strictly better than having all vaults on PKDF2. The goal of having version alongside the vault is that we can have mixed versions if absolutely necessary.

If the migration fails then most likely writing a new vault with Argon will fail as well.

I don't know why argon2 would fail, so there's nothing to indicate to me whether such a failure would be transient or not. It could fail during migration due to issues loading wasm, but it could also run into transient resource limitations--again, it's unclear to me.

All that said: you are right! The piece I forgot about here is that we (correctly) don't keep the password around after the initial unlock. This means if the initial unlock fails to use argon2 to derive the key, and we want to try to encrypt with an argon2-derived key later… We would have to keep the password around/cached instead of just the key. A definite no! So let's put this one to rest.

- allowed Jest to fetch WebAssembly files
- moved `crypto.subtle` mock to global setup
- for Jest to work with WebAssembly we need to update to next major version
- to support dependencies for new Jest version we need to bump Typesript as well
- let's fix problems found by new Typescript version
Allow destructuring objects to remove unwanted fields from the objects.
This is pattern we are using often across the codebase.
@jagodarybacka jagodarybacka marked this pull request as ready for review June 28, 2023 15:02
@jagodarybacka jagodarybacka requested review from a team as code owners June 28, 2023 15:02
.eslintrc.js Outdated Show resolved Hide resolved
@@ -10,6 +11,11 @@ export type EncryptedVault = {
cipherText: string
}

export enum VaultVersion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: my note about not leaking vault version information: that basically means we should try to be able to not export this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see but if we need to know and save (in the service) on which version we are then we need this exported, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately discussed in #3502 (comment), since what I was pushing for is precisely not saving in the service which version we're on—but the need for that has been clarified.

- return `success` boolean
- make sure decrypted vaults match
- send event on migration fail
Copy link
Contributor

@Shadowfiend Shadowfiend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handful of final tidbits here:

  • Let's rename to migrateVaultsToLatestVersion.
  • Let's return the error message from the exception if there is a failure.
  • Let's bubble the error message to the migration failure analytics event.
  • Let's update the analytics event to VAULT_MIGRATION alongside the above.
  • MIGRATION_FAILED shouldn't be a one-time event, we should see it every time a migration fails. It'll help us better understand whether we're seeing transient or permanent failures.
  • Last but not least: I think the success flag should actually be a migrated flag, which should be true if the vaults (a) needed migration and (b) succeeded at migrating. I think the error message should be the true indicator that there was an error. So we would send the migration event if (migrated), and send the failure event if (error !== undefined). Thoughts?

@Shadowfiend
Copy link
Contributor

Pushed the above changes to accelerate a little; we can always walk the commit back if necessary.

The main thing I noticed while adding a test for the error messages is that (and this was true before my changes) if we track migration failures, we track migration failures due to incorrect passwords as well as ones that might be due to an internal technical issue. There are a few ways we could mitigate this, but my instinctive reaction is that we roll it out as-is and see how it goes.

Vault migration is no longer tracked as Argon2 specifically, but
generically for all migrations. Already-migrated vaults are not tracked,
and the migration function return value reflects that no migration was
performed. Additionally, error messages are bubbled out of the migration
function and reported up to the caller.

The main outcome here is that PostHog migration events include the
migrated-to version, and PostHog migration failure events include the
error message. This will leave us open to future migrations, and will
let us know if there are certain failures that are happening broadly
that we may be able to do something about.

Notably, wrong passwords will be tracked as migration errors if a wrong
password is typed with an older vault version in the mix. Mitigating
this may or may not be a good idea.
@jagodarybacka
Copy link
Contributor Author

if we track migration failures, we track migration failures due to incorrect passwords

ugh 😞 so what exactly do we want to measure with these events? I think info on how many users are migrated is the most important one anyway so we are good here

Copy link
Contributor

@Shadowfiend Shadowfiend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All righty, good to go here. Since I pushed some code, going to let @jagodarybacka do a final sanity check and then merge.

@Shadowfiend
Copy link
Contributor

if we track migration failures, we track migration failures due to incorrect passwords

ugh 😞 so what exactly do we want to measure with these events? I think info on how many users are migrated is the most important one anyway so we are good here

Secondarily, I think it'll be useful to know potentially know what we need to look at if they aren't being migrated, or at least how many failures in migrations we're seeing. I'm comfortable shipping without being sure we can get that info though.

Copy link
Contributor Author

@jagodarybacka jagodarybacka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA went fine, let's 🚢

@jagodarybacka jagodarybacka merged commit 4fc5d6a into keyring-with-pk Jul 4, 2023
@jagodarybacka jagodarybacka deleted the migrate-to-argon branch July 4, 2023 08:15
@jagodarybacka jagodarybacka mentioned this pull request Jul 4, 2023
@kkosiorowska kkosiorowska mentioned this pull request Jul 13, 2023
kkosiorowska pushed a commit that referenced this pull request Jul 14, 2023
## What's Changed
* Add private key onboarding flow by @jagodarybacka in
#3119
* Private key JSON import by @jagodarybacka in
#3177
* Allow export of private keys and mnemonics by @jagodarybacka in
#3248
* Export private key form by @jagodarybacka in
#3255
* Unlock screen for the account backup by @kkosiorowska in
#3257
* Show mnemonic menu by @jagodarybacka in
#3259
* Fix background blur issue by @jagodarybacka in
#3265
* Account backup UI fixes by @jagodarybacka in
#3270
* Fix unhiding removed accounts by @jagodarybacka in
#3282
* New error for incorrectly decrypted JSON file by @jagodarybacka in
#3293
* Export private keys from HD wallet addresses by @jagodarybacka in
#3253
* Refactor keyring redux slice to remove `importing` field by
@jagodarybacka in #3309
* 📚 Accounts backup by @kkosiorowska in
#3252
* Catch Enter keypress on Unlock screen by @jagodarybacka in
#3355
* Rename `keyring` to `internal signer` and other improvements by
@jagodarybacka in #3331
* 🗝 QA - Accounts backup and private key import by @jagodarybacka in
#3266
* Remove private key signers if they are replaced by accounts from HD
wallet by @jagodarybacka in
#3377
* RFB 4: One-Off Keyring Design by @Shadowfiend in
#3372
* Copy to clipboard warning by @kkosiorowska in
#3488
* Allow setting custom auto-lock timer by @hyphenized in
#3477
* Use Argon2 for encrypted vaults by @jagodarybacka in
#3502
* 👑 Private keys import and accounts backup by @jagodarybacka in
#3089
* Untrusted assets should not block the addition of custom tokens by
@kkosiorowska in #3491
* Flip updated dApp connections flag by @Shadowfiend in
#3492
* v0.41.0 by @Shadowfiend in
#3531
* Switch to a given network if adding a network that is already added.
by @0xDaedalus in #3154
* Remove waiting for Loading Doggo component in E2E tests by
@jagodarybacka in #3541
* Squeeze content to better fit on Swaps page by @jagodarybacka in
#3542
* Refactor of terms for verified/unverified assets by @kkosiorowska in
#3528
* Fix ChainList styling by @fulldecent in
#3547
* Update release checklist by @jagodarybacka in
#3548
* Fix custom asset price fetching by @hyphenized in
#3508
* Sticky Defaults: Make Taho-as-default replace MetaMask in almost all
cases by @Shadowfiend in
#3546

## New Contributors
* @fulldecent made their first contribution in
#3547

**Full Changelog**:
v0.41.0...v0.42.0

Latest build:
[extension-builds-3549](https://github.com/tahowallet/extension/suites/14268975651/artifacts/801826435)
(as of Thu, 13 Jul 2023 09:51:56 GMT).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants