Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for previous migrations that didn't strip whitespace for all the relevant records #2374

Merged
merged 3 commits into from
Sep 25, 2023

Conversation

floehopper
Copy link
Contributor

This includes new migrations to strip leading/trailing whitespace from:

  • User#name
  • BatchInvitationUser#name
  • Organisation#name

In #2373 I noticed that the query we'd used to do this in #2291 & #2351 didn't escape the regular expression properly. ActiveRecord seems to have generated the following SQL WHERE expression:

WHERE (name REGEXP('^ +') OR name REGEXP(' +$')

I think this will only have found names with leading/trailing space characters and not leading/trailing newlines/carriage-returns/tabs. Thus the previous migration didn't do anything bad, it just may have missed some records.

In these new versions of the migration I've constructed the query slightly differently. Firstly I've double-escaped the whitespace character in the regular expression and secondly I've used an Array Condition vs a Pure String Condition to ensure argument safety. The new query generates the following SQL WHERE expression:

WHERE (name REGEXP '^\\s+' OR name REGEXP '\\s+$')

This has the REGEXP string argument escaped correctly.

As per the migrations in #2373 I've use ActiveRecord::Persistence#update_attribute in order to skip model validation, in case some of the existing records are invalid for other reasons.

As before I did consider writing the migration as a SQL UPDATE, but generating a TRIM expression that is exactly equivalent to String#strip isn't as trivial as it sounds!

@mike29736 mike29736 self-assigned this Sep 25, 2023
Copy link
Contributor

@mike29736 mike29736 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😳 I'm impressed that you spotted this... even just looking into it to verify the changes, there don't seem to be many clues!

FWIW, it looks like we could also use character classes to achieve the same thing (e.g. "^[[:space:]]+", found here: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax)

@floehopper
Copy link
Contributor Author

😳 I'm impressed that you spotted this... even just looking into it to verify the changes, there don't seem to be many clues!

Thanks - I spotted it because I was getting some confusing results when doing counts before & after the migration. It seems as if some of the BatchUserInvitation records had trailing newlines in their email field - perhaps we didn't have anything like that in the previous fields...?

FWIW, it looks like we could also use character classes to achieve the same thing (e.g. "^[[:space:]]+", found here: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax)

Yes, I did try that, but I was so confused by the results I was getting in the DB console at that point, I decided to avoid MySQL altogether! In hindsight, I think it probably would've worked!

@floehopper
Copy link
Contributor Author

Rebasing on main and resolving conflict before force-pushing in preparation for merging.

@floehopper floehopper force-pushed the fix-for-migrations-that-stripped-whitespace branch from 022166b to 5e2cc0d Compare September 25, 2023 14:52
In #2373 I noticed that the query we'd used to do this in #2291 didn't
escape the regular expression properly. ActiveRecord seems to have
generated the following SQL WHERE expression:

    WHERE (name REGEXP('^ +') OR name REGEXP(' +$')

I _think_ this will only have found names with leading/trailing
space characters and not leading/trailing
newlines/carriage-returns/tabs. Thus the previous migration didn't do
anything bad, it just missed _some_ records.

In this new version of the migration I've constructed the query slightly
differently. Firstly I've double-escaped the whitespace character in the
regular expression and secondly I've used an Array Condition vs a Pure
String Condition [1] to ensure argument safety. The new query generates
the following SQL WHERE expression:

    WHERE (name REGEXP '^\\s+' OR name REGEXP '\\s+$')

This has the REGEXP string argument escaped correctly.

As per the migrations in #2373 I've use
ActiveRecord::Persistence#update_attribute in order to skip model
validation, in case some of the existing User records are invalid for
other reasons.

As before I did consider writing the migration as a SQL UPDATE, but
generating a TRIM expression that is exactly equivalent to String#strip
isn't as trivial as it sounds!

[1]: https://guides.rubyonrails.org/active_record_querying.html#conditions
In #2373 I noticed that the query we'd used to do this in #2291 didn't
escape the regular expression properly. ActiveRecord seems to have
generated the following SQL WHERE expression:

    WHERE (name REGEXP('^ +') OR name REGEXP(' +$')

I _think_ this will only have found names with leading/trailing
space characters and not leading/trailing
newlines/carriage-returns/tabs. Thus the previous migration didn't do
anything bad, it just may have missed _some_ records.

In this new version of the migration I've constructed the query slightly
differently. Firstly I've double-escaped the whitespace character in the
regular expression and secondly I've used an Array Condition vs a Pure
String Condition [1] to ensure argument safety. The new query generates
the following SQL WHERE expression:

    WHERE (name REGEXP '^\\s+' OR name REGEXP '\\s+$')

This has the REGEXP string argument escaped correctly.

As per the migrations in #2373 I've use
ActiveRecord::Persistence#update_attribute in order to skip model
validation, in case some of the existing BatchInvitationUser records are
invalid for other reasons.

As before I did consider writing the migration as a SQL UPDATE, but
generating a TRIM expression that is exactly equivalent to String#strip
isn't as trivial as it sounds!

[1]: https://guides.rubyonrails.org/active_record_querying.html#conditions
In #2373 I noticed that the query we'd used to do this in #2351 didn't
escape the regular expression properly. ActiveRecord seems to have
generated the following SQL WHERE expression:

    WHERE (name REGEXP('^ +') OR name REGEXP(' +$')

I _think_ this will only have found names with leading/trailing
space characters and not leading/trailing
newlines/carriage-returns/tabs. Thus the previous migration didn't do
anything bad, it just may have missed _some_ records.

In this new version of the migration I've constructed the query slightly
differently. Firstly I've double-escaped the whitespace character in the
regular expression and secondly I've used an Array Condition vs a Pure
String Condition [1] to ensure argument safety. The new query generates
the following SQL WHERE expression:

    WHERE (name REGEXP '^\\s+' OR name REGEXP '\\s+$')

This has the REGEXP string argument escaped correctly.

As per the migrations in #2373 I've use
ActiveRecord::Persistence#update_attribute in order to skip model
validation, in case some of the existing Organisation records are
invalid for other reasons.

As before I did consider writing the migration as a SQL UPDATE, but
generating a TRIM expression that is exactly equivalent to String#strip
isn't as trivial as it sounds!

[1]: https://guides.rubyonrails.org/active_record_querying.html#conditions
@floehopper floehopper force-pushed the fix-for-migrations-that-stripped-whitespace branch from 5e2cc0d to 4bdebf6 Compare September 25, 2023 16:51
@floehopper floehopper merged commit 9da737c into main Sep 25, 2023
6 checks passed
@floehopper floehopper deleted the fix-for-migrations-that-stripped-whitespace branch September 25, 2023 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants