Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a stable consent hash #1137

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

Badlapje
Copy link

@Badlapje Badlapje commented May 19, 2021

Prior to this commit, there was no stable hashing algorithm used for consent attributes. Casing, order of attributes, ... could all change the hash used to store consent attributes. As such, it couldn't be relied upon.

This commit adds a stable hashing algorithm. It stores all new consent attributes with the new hash, but when retrieving does so checking first the old hash and then the new one.

Edit 4-5-2022 MKodde
After @Badlapje s work was finished on the project, I inherited his work and have finished the work remaining on this ticket.

  1. The consent stabilization feature was implemented in the consent workflows of EB
  2. A database migration was added to allow storing both the unstable and the stable hash.
  3. Tests where added and updated

The QA tests fail on a security check on the JS dependencies. This will be fixed on another PR.

Pivotal ticket: https://www.pivotaltracker.com/story/show/176513931

@Badlapje Badlapje requested a review from MKodde May 19, 2021 15:51
@Badlapje Badlapje changed the title Add a consentHashService Add a stable consent hash May 19, 2021
Copy link
Member

@MKodde MKodde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first review looks very promising! See some improvement suggestions below.

@Badlapje Badlapje force-pushed the feature/stabilize-consent-hash branch 6 times, most recently from 4e3a85e to dff4b53 Compare May 20, 2021 16:46
@Badlapje Badlapje requested a review from MKodde June 1, 2021 17:07
@Badlapje Badlapje force-pushed the feature/stabilize-consent-hash branch from dff4b53 to 595c767 Compare June 2, 2021 12:38
@MKodde MKodde force-pushed the feature/stabilize-consent-hash branch 8 times, most recently from c716bf8 to a5dd163 Compare May 4, 2022 10:09
@MKodde MKodde requested review from thijskh and VadimSchmitz May 4, 2022 10:28
return !$this->_consentEnabled ||
$this->_hasStoredConsent($serviceProvider, ConsentType::TYPE_EXPLICIT);
$consent = $this->_hasStoredConsent($serviceProvider, ConsentType::TYPE_EXPLICIT);
return !$this->_consentEnabled || $consent->given();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means now queries (and hash calculations) will be done even if the consent feature toggle is off. This also goes for the other public methods that use this construct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very good point! Will change this to only search for previous consent when consent is enabled.

public function storeConsentHash(array $parameters): bool
{
$query = "INSERT INTO consent (hashed_user_id, service_id, attribute_stable, consent_type, consent_date, deleted_at)
VALUES (?, ?, ?, ?, NOW(), '0000-00-00 00:00:00')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's stored as being deleted? Because a non-null value is being set for deleted_at?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has to do with the following.

The deleted_at column became part of the primary key, because we can now have multiple rows for a given user for a given service. Where one of those rows is an active consent row, and the rest are soft deleted versions of previously given consent.

Having a dattime as part of a key requires it to be not nullable. But we do not want the column to be empty. Thats why a null-datetime is set for those values. DBAL will translate this to 'null'. But for the DBMS is satisfied. At least that's my understanding.

$consentType,
];

$hasUnstableConsentHash = $this->_hashService->retrieveConsentHash($parameters);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about the approach. This currently queries first for the new hash. If not found then for the old hash. And if also not found will present consent screen.

Why is this done over the following approach

  • oldHash = _getAttributesHash()
  • newHash = _getStableAttributesHash()
  • SELECT * FROM consent where attributes_stable = :newHash OR attributes = :oldHash
    so you always know in one query whether consent was given?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a good point! Having this in one query is much better. Will update this.

// this up() migration is auto-generated, please modify it to your needs
$this->abortIf($this->connection->getDatabasePlatform()->getName() !== 'mysql', 'Migration can only be executed safely on \'mysql\'.');

$this->addSql('ALTER TABLE consent ADD attribute_stable VARCHAR(80) NOT NULL, CHANGE attribute attribute VARCHAR(80) DEFAULT NULL');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet decided on the pros and cons of adding a new hash field next to the old field.

I believe the problem is solvable with just one database field. Not sure yet if that is preferable though.

Copy link
Member

@MKodde MKodde May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open for other suggestions! Having it in one single column will work too in my opinion. The 'OR' based query solution would work just as well on a single column. But would make it harder to decide when we can stop supporting the old style attribute hash method.

Koen Cornelis and others added 4 commits May 5, 2022 13:24
Prior to this commit, there was no service specifically for hashing
attribute arrays for consent.

This commit adds such a service with both the old hashing algorithm and
a stable hashing algorithm.

Pivotal ticket: https://www.pivotaltracker.com/story/show/176513931
The consent queries added to the ConsentRepository.php file (deleted)
should have been added to the existing DbalConsentRepository. While at
it, naming conventions have been applied to the query names. And the
service config was updated.
Extracting this interface from the existing service is allowing us to
more easily test the service. As mocking a final class is not possible.
MKodde added 8 commits May 5, 2022 13:24
This might prove usefull down the road
This represents the consent type at hand (implicit or explicit consent).
A third option is that no consent has been given.
The requirements are stated in the story:

https://www.pivotaltracker.com/story/show/176513931 and specifically
these points:

A challenge is that we do not want to invalidate all given consents with the
current algorithm. So we implement it as follows:

*  We do not touch the existing consent hashing method at all
*  We create a new hashing method that is more stable.
*  We cover this new method with an abundance of unit tests to verify the
   stability given all sorts of inputs.
*  We change the consent query from (pseudocode):
     SELECT *
     FROM consent
     WHERE user = me
     AND consenthash = hashfromoldmethod
     OR consenthash = hashfromnewmethod
*  Newly given consent will be stored with the new hash.
*  When old consent matched, still generate new consent hash (without showing
   consent screen
The NameID objects where causing issues when creating a stabilized
consent hash. The objects can not be serialized/unserialized. By
normalizing them before starting to perform other normalization tasks on
the attribute array, this issue can be prevented.
Only get the stored consent when consent is enabled. The other methods
need no adjusting as the short-circuiting by the first condition
prevents those expressions from being executed.

Consider:

`(true || $this->expensiveMethodCall())`

The expensiveMethodCall is never called as the first condition already
decided the rest of the condition.

https://www.php.net/manual/en/language.operators.logical.php
Previously when testing if previous consent was given was based on
testing for old-style attribute hash calculation. And if that one did
not exist, the new (stable) attribute hash was tested.

Having that in two queries made little sense as it can easily be
performed in one action.

The interface of the repo and service changed slightly, as now a value
object is returned instead of a boolean value. The value object reflects
the version (type of attribute hash) of consent that was given. That can
either be: unstable, stable or not given at all.
@MKodde MKodde force-pushed the feature/stabilize-consent-hash branch from 26e8ac1 to 145ebd9 Compare May 5, 2022 11:24
@MKodde MKodde added 6.8 and removed 6.7 labels May 5, 2022
@thijskh thijskh removed the 6.8 label Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants