Skip to content

Latest commit

 

History

History
410 lines (323 loc) · 16.1 KB

README.adoc

File metadata and controls

410 lines (323 loc) · 16.1 KB

ActiveID: binary UUIDs for ActiveRecord

Gem Version CI Status Code Climate Pull Requests Commits since latest

A modern, performant and database-agnostic solution for storing UUIDs in ActiveRecord 5.0+, without any obligatory monkey patches.

Rationale

If you search for “uuid” in RubyGems, you’ll get 142 results (as of January 2019)… Most are outdated, all are far from our needs. Yes, we think we need another one.

ℹ️
ActiveID has evolved from a popular, but no longer maintained ActiveUUID gem, and its forks. From 2018 the gem was entirely rewritten to support newer Rails releases, and was finally detached as a fork in 2020 to prevent confusion from users. We thank Nate for bringing ActiveID to life!

Storing UUIDs as binaries in MySQL and SQLite3

UUIDs are 16 bytes long, however their human-readable string representation takes 36 characters. As a consequence, storing UUIDs in a human-readable format is space inefficient. What is worse, whereas table row size is seldom a big concern, size of table indices is significant — the bigger part of given index fits in RAM, the faster it works. And UUID columns are commonly indexed…

ℹ️

Another performance boost for UUIDs version 1 can be achieved by bits rearrangement. This is not implemented yet, see issue #43.

This gem brings an easy-to-use ability to efficiently store UUIDs in databases which do not provide a dedicated UUID data type (i.e. MySQL, MariaDB, SQLite3, etc.).

Database-agnosticism

This gem provides a uniform API for storing UUIDs in database, be it MariaDB, MySQL, SQLite3 (in binary or string format), or PostgreSQL (native data type). This is especially important when using it as a dependency of another gem.

Monkey patching is optional

No core feature relies on Rails monkey patching. Monkey patches can interfere with other gems, and lead to issues. Nevertheless, some convenient features (currently, migration methods only) are provided via monkey patching. Enabling them is entirely optional, and their absence can be workarounded easily.

Strings are not perfect for UUIDs

Although UUIDs are commonly represented as strings, it is beneficial to introduce a dedicated class for following reasons:

  • not every sequence of 16 bytes (or 32 hexadecimal digits) makes a valid UUID

  • UUIDs are not opaque, they have their inner structure which can be accessed (reading timestamp from time-based UUIDs is especially useful)

  • using string equality operator for UUID comparison may give wrong results (up-cased or lowercased strings, with dashes or without)

It is somewhat similar case to URIs, which also can be represented as plain strings, but having a dedicated URI class is quite convenient.

ActiveID uses UUID class from UUIDTools gem to represent UUIDs.

Usage

💡

You may want to explore examples directory, in which typical use cases are covered in bit more detail.

Installation

Depending on you want to apply monkey patches or not, require either activeid/all (with monkey patches) or activeid (without them).

For example, if you are using Gemfile:

gem "activeid" # without monkey patches
# or
gem "activeid", require: "activeid/all" # with monkey patches

Depending on your needs, you can also pick monkey patches selectively — just take a look at the contents of lib/activeid/all.rb. However, currently it is not very useful, as there is very little to choose from.

Adding UUIDs to models

ActiveID relies on ActiveRecord’s attributes API. Two attribute types are defined: StringUUID and BinaryUUID.

StringUUID serializes UUIDs as 36 characters long strings. It is compatible with textual SQL types, e.g. VARCHAR(36), and more importantly, with PostgreSQL-specific UUID type.

BinaryUUID serializes UUIDs as 16 bytes long binaries, which can be stored in binary columns, e.g. BLOB(16) in SQLite3 or VARBINARY(16) in MySQL. However, it is not compatible with PostgreSQL at all due to syntax differences. See "Choosing between string and binary serialization" section for a brief explanation of pros and cons of both approaches.

Whichever attribute type you prefer to use, an ActiveID::Model module must be included in model.

For example, following example model stores two UUID attributes: id, and thread_id as binaries.

class Work < ActiveRecord::Base
  include ActiveID::Model
  attribute :id, ActiveID::Type::BinaryUUID.new
  attribute :author_id, ActiveID::Type::BinaryUUID.new
  belongs_to :author
end

Database migrations

A convenience #uuid method is added via monkey patching to Active Record’s Table and TableDefinition classes.

  • In MySQL adapter, it stands for a VARBINARY(16) column.

  • In SQLite3 adapter, it stands for a BLOB(16) column.

  • In PostgreSQL adapter, it is shadowed by a stock Rails method ::ActiveRecord::ConnectionAdapters::PostgreSQL::ColumnMethods:uuid, which stands for a UUID column.

If you want to use UUID column in your primary key, pass :id ⇒ false option to create_table method and :primary_key ⇒ true to column definition.

For example:

class CreateWorks < ActiveRecord::Migration
  def change
    create_table :works, id: false, force: true do |t|
      t.uuid :id, primary_key: true
      t.uuid :author_id, index: true
      t.string :title
      t.timestamps
    end
  end
end

Alternatively, if monkey patches are disabled, #uuid method can be substituted with #binary in MySQL and SQLite3 adapters. Following snippet is equivalent to the above one in these two adapters. Please note :limit ⇒ 16, which is passed as an option.

class CreateWorks < ActiveRecord::Migration
  def change
    create_table :works, id: false, force: true do |t|
      t.binary :id, limit: 16, primary_key: true
      t.binary :author_id, limit: 16, index: true
      t.string :title
      t.timestamps
    end
  end
end

Registering UUID types in Active Record’s type registry

For convenience, Active UUID types can be added to Active Record’s type registry. Then you can reference them in your models with a symbol. See Rails API docs for detailed information.

For example, following will register ActiveID::Type::BinaryUUID at :uuid symbol for all adapters except for PostgreSQL, in which this symbol is already taken:

ActiveRecord::Type.register(
  :uuid,
  ActiveID::Type::BinaryUUID,
)

With above set, only symbol needs to be specified in attribute declaration, as in following example:

class Author < ActiveRecord::Base
  include ActiveID::Model
  attribute :id, :uuid
end

It is also possible to override :uuid in PostgreSQL adapter:

ActiveRecord::Type.register(
  :uuid,
  ActiveID::Type::StringUUID,
  adapter: :postgresql,
  override: true,
)
🔥

Overriding standard attribute types may cause other gems to behave abnormally.

Using UUIDs as primary keys

When model’s primary key is a UUID, Active UUID automatically generates its value as a version 1, 4, or 5 UUID:

  • Version 1 UUIDs store timestamp of their creation, and are monotonically increasing in time. This is very advantageous in some use cases.

  • Version 4 UUIDs are pseudo-randomly generated.

  • Version 5 UUIDs are generated deterministically via SHA-1 hashing from values of specified attributes, and UUID namespace. They are well-suited for natural keys.

UUIDs of all versions can be explicitly assigned to attributes.

Random primary keys (version 4 UUIDs)

If model’s primary key is a UUID, a version 4 UUID is generated by default. For example:

class Author < ActiveRecord::Base
  include ActiveID::Model
  attribute :id, ActiveID::Type::StringUUID.new
end

Time-based primary keys (version 1 UUIDs)

They are enabled for model’s primary key with #uuid_generator method. For example:

class Author < ActiveRecord::Base
  include ActiveID::Model
  attribute :id, ActiveID::Type::StringUUID.new
  uuid_generator :time
end

Name-based primary keys a.k.a. natural keys (version 5 UUIDs)

They are enabled for model’s primary key by passing attribute names to #natural_key method, and namespace to #uuid_namespace method. The latter method accepts only UUIDs, either in string format, or a UUIDTools::UUID object. If #uuid_namespace method is omitted, then ISO OID namespace is used.

In following example, a natural key in a6908e1e-5493-4c55-a11d-cd8445654de6 namespace will be build of values of author_id, and title attributes.

class Work < ActiveRecord::Base
  include ActiveID::Model
  attribute :id, ActiveID::Type::BinaryUUID.new
  attribute :author_id, ActiveID::Type::BinaryUUID.new
  belongs_to :author
  natural_key :author_id, :title
  uuid_namespace "a6908e1e-5493-4c55-a11d-cd8445654de6"
end

Choosing between string and binary serialization

ActiveID allows you to choose between two ways of UUID serialization: as 36 characters long string, or as 16 bytes long binary.

In PostgreSQL, the answer is easy: you should always choose string serialization. It perfectly works with native UUID data type, which is a non-standard feature of PostgreSQL. It also works with textual data types (i.e. VARCHAR, TEXT, etc.), but a UUID type seems to be a better choice for performance reasons. Because of special syntax requirements in PostgreSQL, it does not work with binary types (i.e. BYTEA), however it seems to be a neglect-able issue, as UUID type is more suitable. Please open an issue if you disagree.

In other RDBSs, either human-readability, or performance must be sacrificed.

With binary serialization, UUIDs are stored in a space-efficient way as 16 bytes long binaries. This is especially beneficial when column is indexed, which is a very common case. Smaller value size means that a bigger piece of index can be kept in RAM, which often leads to a significant performance boost. The downside is that this representation is difficult to read for humans, who access serialized values outside Rails (e.g. in a database console, or in database logs). See also an excellent article "Store UUID in an optimized way" in Percona blog for more information about storing UUIDs as binaries.

With string serialization, UUIDs are stored as 36 characters long strings, which consist only of lowercase hexadecimal digits, and dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx). They are easy to read for humans, but may hamper performance of indices, especially in case of large tables.

Reading binary UUIDs in a database console

MySQL features a BIN_TO_UUID() function, which converts binary UUIDs to their human-readable string representation. There is a feature request to add a similar feature to MariaDB.

Contributing

First, thank you for contributing! We love pull requests from everyone. By participating in this project, you hereby grant Ribose Inc. the right to grant or transfer an unlimited number of non exclusive licenses or sub-licenses to third parties, under the copyright covering the contribution to use the contribution by all means.

Here are a few technical guidelines to follow:

  1. Open an issue to discuss a new feature.

  2. Write tests to support your new feature.

  3. Make sure the entire test suite passes locally and on CI.

  4. Open a Pull Request.

  5. After receiving feedback, perform an interactive rebase on your branch, in order to create a series of cohesive commits with descriptive messages.

  6. Party!

Credits

This gem is developed, maintained and funded by Ribose Inc.

The ActiveID gem which ActiveID was based on has been developed by Nate Murray with notable help of:

  • pyromaniac

  • Andrew Kane

  • Devin Foley

  • Arkadiy Zabazhanov

  • Jean-Denis Koeck

  • Florian Staudacher

  • Schuyler Erle

  • Florian Schwab

  • Thomas Guillory

  • Daniel Blanco Rojas

  • Olivier Amblet

License

The gem is available as open source under the terms of the MIT License.

See also

  • RFC 4122 "A Universally Unique IDentifier (UUID) URN Namespace"

  • ActiveID gem (supports Rails < 5)