Skip to content

Upgrading from v2.2 to v3.0

Rainer Simon edited this page Feb 20, 2018 · 1 revision

Recogito v3.0 introduces breaking changes to the system architecture and data model. This guide documents the changes, and the steps needed to upgrade the index (ElasticSearch 5.6.5) from Recogito v2.2 to v3.0.

Data model changes

The following index types remain identical to Recogito v2.2:

  • annotation_history
  • contribution
  • visit

The new schema introduces three major breaking changes:

  • in the annotation type, bodies used to have a uri field, storing the URI of the place as a string. In v3.0, the uri field is replaced with a reference field. This field stores a nested object, with a uri field and, optionally, a union_id field, containing the union UUID of the entity, if it is indexed in Recogito.
  • the geotag type has been dropped.
  • the place type is superseded by a generic entity type. Compared to v2.2, entity introduces the following changes:
    • for clarity, id is replaced by union_id
    • an additional entity_type field (PLACE, PERSON, etc.)
    • a title field at the top level
    • a stored bbox geo_shape field
    • for conflated records, source_gazetteer is replaced by source_authority (expects a URI identifier)
    • for consistency, last_sync_at is replaced by __last_synced_at
    • an added country_code field for records
    • place_types is replaced by subjects
    • an added priority field (type long) to hold numeric weight/importance/etc. score, e.g. a place population count
    • close_matches and exact_matches is replaced by a generic links field, which contains a list of objects of the form { "uri": "http://www.example.com/entity/1", "link_type": "closeMatch" }

Migrating indices

Because of the removal of geo_tag, migrating annotations between index versions requires more than just a simple conversion. For every annotation body, we need to query the index with the bodies.uri field, in order to obtain the referenced entity's union_id. Recogito 3.0 includes a utility to perform this migration. The other index types need to migrated manually:

  1. annotation_history, contribution and visit must be reindexed using ElasticSearch standard reindex API

  2. the entity index must be rebuilt from scratch, by importing gazetters via the Recogito admin UI

  3. contents of the annotation index must to be migrated last, using the migration utility. (TODO...)