A library for Rails developers who are not willing to leave data migrations to chance.
- Rationale
- Install
- Usage
- Testing
- Suggested Workflow
- Known Issues and Limitations
- Trivia
- Acknowledgments
- License
The main purpose of Rails' migration feature is to issue commands that modify the schema using a consistent process. Migrations can also be used to add or modify data. This is useful in an existing database that can't be destroyed and recreated, such as a production database.
–Migrations and Seed Data, Rails Guide for Active Record Migrations
The motivation behind Rails' migration mechanism is schema modification. Using it for data changes in the database comes second. Yet, adding or modifying data via regular migrations can be problematic.
One issue is that application deployment now depends on the data migration to be completed. This may not be a problem with small databases but large databases with millions of records will respond with hanging or failed migrations.
Another issue is that data migration files tend to stay in db/migrate
for posterity.
As a result, they will run whenever a developer sets up their local development environment.
This is unnecessary for a pristine database. Especially with scripts to
seed the correct data.
The purpose of monarch_migrate
is to solve the above issues and to:
- Provide a uniform process for modifying data in the database.
- Separate data migrations from schema migrations.
- Allow automated testing of data migrations.
It assumes that:
- You run data migrations only on production and rely on seed scripts for local development.
- You run data migrations manually.
- You want to test data migrations in a thorough and automated manner.
Add the gem to your Gemfile:
gem "monarch_migrate"
Run the bundle command to install it.
After you install the gem, you need to run the generator:
rails generate monarch_migrate:install
The install generator creates a schema migration file that adds a data_migration_records
table. It is where the gem keeps track of data migrations we have ran.
Run the schema migration to create the table:
rails db:migrate
Data migrations have a similar structure to regular migrations in Rails.
Files follow the same naming pattern but reside in db/data_migrate
.
rails generate data_migration backfill_users_name
Edit the newly created file to define the data migration:
# db/data_migrate/20220605083010_backfill_users_name.rb
ActiveRecord::Base.connection.execute(<<-SQL)
UPDATE users SET name = concat(first_name, ' ', last_name) WHERE name IS NULL;
SQL
In contrast to regular migrations, there is no need to inherit any classes. It is plain ruby code where you can refer to any object you need. If the database adapter supports transactions, each data migration will automatically be wrapped in a separate transaction.
To run pending data migrations:
rails data:migrate
To run a specific version:
rails data:migrate VERSION=20220605083010
You can see the status of data migrations with:
rails data:migrate:status
Rollback functionality is not provided by design. Create another data migration instead.
After the data manipulation is complete, you may want to trigger a long running task. Yet, there are some pitfalls to be aware of.
To avoid them, use an after-commit callback:
# db/data_migrate/20220605083010_backfill_users_name.rb
ActiveRecord::Base.connection.execute(<<-SQL)
UPDATE users SET name = concat(first_name, ' ', last_name) WHERE name IS NULL;
SQL
after_commit do
SearchIndex::RebuildJob.perform_later
end
Testing data migrations can be the difference between rerunning
the migration and having to recover from a data loss. This is
why monarch_migrate
includes test helpers for both RSpec and TestUnit.
For rspec
, add the following line to your spec/rails_helper.rb
:
require "monarch_migrate/rspec"
Then:
# spec/data_migrations/20220605083010_backfill_users_name_spec.rb
describe "20220605083010_backfill_users_name", type: :data_migration do
subject { run_data_migration }
it "assigns user's name" do
user = users(:without_name_migrated)
# or if you're using FactoryBot:
# user = create(:user, first_name: "Guybrush", last_name: "Threepwood", name: nil)
expect { subject }.to change { user.reload.name }.to("Guybrush Threepwood")
end
it "does not assign name to already migrated users" do
user = users(:with_name_migrated)
# or if you're using FactoryBot:
# user = create(:user, first_name: "", last_name: "", name: "Guybrush Threepwood")
expect { subject }.not_to change { user.reload.name }
end
context "when the user has no last name" do
it "does not leave a trailing space" do
user = users(:without_name_migrated)
# or if you're using FactoryBot:
# user = create(:user, first_name: "Guybrush", last_name: nil, name: nil)
expect { subject }.to change { user.reload.name }.to("Guybrush")
end
end
# And so on ...
end
For test_unit
, add this line to your test/test_helper.rb
:
require "monarch_migrate/test_unit"
Then:
# test/data_migrations/20220605083010_backfill_users_name_test.rb
class BackfillUsersNameTest < MonarchMigrate::TestCase
def test_assigns_users_name
user = users(:without_name_migrated)
# or if you're using FactoryBot:
# user = create(:user, first_name: "Guybrush", last_name: "Threepwood", name: nil)
run_data_migration
assert_equal "Guybrush Threepwood", user.reload.name
end
def test_does_not_assign_name_to_alredy_migrated_users
user = users(:with_name_migrated)
# or if you're using FactoryBot:
# user = create(:user, first_name: "", last_name: "", name: "Guybrush Threepwood")
run_data_migration
assert_equal "Guybrush Threepwood", user.reload.name
end
def test_does_not_leave_trailing_space_when_user_has_no_last_name
user = users(:without_name_migrated)
# or if you're using FactoryBot:
# user = create(:user, first_name: "Guybrush", last_name: nil, name: nil)
run_data_migration
assert_equal "Guybrush", user.reload.name
end
# And so on ...
end
In certain cases, it makes sense to also test with manually running the data migration against a production clone of the database.
Data migrations become obsolete, once the data manipulation successfully completes. The same applies for the corresponding tests.
The suggested development workflow is:
- Implement the data migration. Use TDD, if appropriate.
- Commit, push and wait for CI to pass.
- Request review from peers.
- Once approved, remove the test files in a consecutive commit and push again.
- Merge into trunk.
This will keep the test files in repo's history for posterity.
The following issues and limitations are not necessary inherent to monarch_migrate
.
Some are innate to migrations in general.
The Active Record way claims that intelligence belongs in your models, not in the database.
–Active Record and Referential Integrity, Rails Guide for Active Record Migrations
Typically, data migrations relate closely to business models. In an ideal Rails world, data manipulations would depend on model logic to enforce validation, conform to business rules, etc. Hence, it is very tempting to use ActiveRecord models in migrations.
Here a regular Rails migration:
# db/migrate/20220605083010_backfill_users_name.rb
class BackfillUsersName < ActiveRecord::Migration[7.0]
def up
User.all.each do |user|
user.name = "#{user.first_name} #{user.last_name}"
user.save
end
end
def down
# ...
end
end
The code above is problematic because:
- It iterates through every user.
- It invokes validations and callbacks, which may have unintended consequences.
- It does not check if the user has already been migrated.
- It will fail when a future developer runs the migration during local development setup after
first_name
andlast_name
columns are gone.
To avoid issues 1-3, we can rewrite the migration to:
# db/migrate/20220605083010_backfill_users_name.rb
class BackfillUsersName < ActiveRecord::Migration[7.0]
def up
User.where(name: nil).find_each do |user|
user.update_column(:name, "#{user.first_name} #{user.last_name}")
end
end
def down
# ...
end
end
Unfortunately, with regular Rails migrations we will still face issue 4.
To avoid it, we need to separate data from schema migrations and not run data migrations locally. With seed scripts, there is no need to run them anyway.
Keep the above in mind when referencing ActiveRecord models in data migrations. Ideally, limit their use and do as much processing as possible in the database.
As mentioned, each data migration runs in a separate transaction. A long-running task within a migration keeps the transaction open for the duration of the task. As a result, the migration may hang or fail.
You could run such tasks asynchronously (i.e. in a background job) but the task may start before the transaction commits. This is a known issue. In addition, a database under load can suffer from longer commit times.
One of the most impressive migrations on Earth is the multi-generational round trip of the monarch butterfly.
Each year, millions of monarch butterflies leave their northern ranges and fly south, where they gather in huge roosts to survive the winter. When spring arrives, the monarchs start their return journey north. The population cycles through three to five generations to reach their destination. In the end, a new generation of butterflies complete the journey their great-great-great-grandparents started.
It is a mystery to scientists how the new generations know where to go, but they appear to navigate using a combination of the Earth's magnetic field and the position of the sun.
Genetically speaking, this is an incredible data migration!
Articles
- Data Migrations in Rails
- Decoupling database migrations from server startup: why and how
- Zero downtime migrations: 500 million rows
- Three Useful Data Migration Patterns for Rails
- Ruby on Rails Model Patterns and Anti-patterns
- Rails Migrations with Zero Downtime
Alternative gems
- https://github.com/OffgridElectric/rails-data-migrations
- https://github.com/ilyakatz/data-migrate
- https://github.com/jasonfb/nonschema_migrations
The gem is available as open source under the terms of the MIT License.