Skip to content

Configuring access to a Linked Data authority

E. Lynette Rayle edited this page Mar 4, 2019 · 1 revision

QA_CONFIG_VERSION 2.0

Overview

Configurations are used to drive the access to linked data authorities and process the results that are returned by those authorities. This document describes how to write a configuration.

Existing configurations

QA comes with two configurations:

  • OCLC Fast Linked Data - supports search and term
  • Library of Congress - supports term only

Look for configuration files in /config/authorities/linked_data.

There are a number of additional authority configurations that are available. See (ld4p/linked_data_authorities)[https://github.com/ld4p/linked_data_authorities] for configurations and instructions on how to use them. These are updated periodically, so check back from time to time to see what's new.

Configuring a Linked Data Authority

The configuration is written in json files. The files are placed in directory config/authorities/linked_data. When the rails server is restarted, the configuration is loaded and the authority is ready for access through QA.

High Level Parts of the Configuration

There are 3 top level parts to the configuration.

  • "prefix": defines linked data prefixes that can be referenced in other parts of the configuration
  • "term": defines how to fetch a single term and interpret the result
  • "search": defines how to search the authority and interpret results

Defining Prefixes

Prefixes is a simple hash that associates a key (e.g. "schema") with the full URL for the ontology (e.g. "http://www.w3.org/2000/01/rdf-schema#").

Example:

"prefixes": {
  "madsrdf": "http://www.loc.gov/mads/rdf/v1#",
  "schema": "http://www.w3.org/2000/01/rdf-schema#",
  "skos": "http://www.w3.org/2004/02/skos/core#",
  "loc": "http://id.loc.gov/vocabulary/identifiers/"
},

It is optional to include the "prefixes" section. It can be left out all together.

General URL configuration that applies to term and search

The URLs to access the external authorities linked data API for term and search are defined using an extended version of Iri Templates.

See https://www.hydra-cg.com/spec/latest/core/#templated-links for information on IRI Templated Links. It defines an IRI Template as...

"An IriTemplate consists of a template literal and a set of mappings. Each IriTemplateMapping maps a variable used in the template to a property and may optionally specify whether that variable is required or not."

The IriTemplates has two parts:

  • define the URL template with substitutions variables
  • define mappings with one for each of the substitution variables

The parts defined at the URL level include...

Config Part Possible Values Comments
"@context" "http://www.w3.org/ns/hydra/context.jsonld" only supported value
"@type" "IriTemplate" only supported value
"template" String This is the template that defines the URL for accessing the external linked data authority. It includes substitution variables that allow setting of values based on values passed to QA.
"variableRepresentation" "BasicRepresentation" only supported value
"mapping" Array array describing how to map the values from QA into the template URL

The mappings include basic information about each variable that will be substituted into the template URL.

Mapping Part Possible Values Comments
"@type" "IriTemplateMapping" only supported value
"variable" String name of the variable as it appears in the template URL
"property" "hydra:freetextQuery" only supported value
"required" true, false true if required in the template URL; otherwise, false
"default" String value to use if one isn't provided (This is an extension not defined in IriTemplate spec.)

The QA configuration requires some variables be defined for search and some for term fetch. Those will be described below when addressing other configuration requirements for search and term.

Term specific configurations

If term fetch is not supported, use the following for this configuration...

"term": {}
Defining the access URL for fetching a term

The configuration for the access URL for fetching a single term follows the general configuration as described above. There are a few configurations that happen outside the Template URL configuration, that impact the processing of the Template URL substitution process...

  • There must be an ID/URI variable defined in the Template URL. It can have any variable name. The mapping of the ID/URI from the QA request to the template mapping variable is specified in the configuration outside of the template under "qa_replacement_patterns"
  • The "term_id" configuration can have two values: "ID" or "URI". This tells the configuration whether the value passed to the Template URL identifying the term to fetch is expected to be a simple ID (e.g. "sh85118553") or a URI (e.g. "http://sws.geonames.org/261707/")

Typical Example when passing a URI:

{ 
  "term": {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type":    "IriTemplate",
      "template": "{term_uri}.rdf",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type":    "IriTemplateMapping",
          "variable": "term_uri",
          "property": "hydra:freetextQuery",
          "required": true
        }
      ]
    },
    "qa_replacement_patterns": {
      "term_id": "term_uri"
    },
    "term_id": "URI",
    ...
  }
}

Typical Example when passing a ID:

{ 
  term: {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type": "IriTemplate",
      "template": "http://id.loc.gov/authorities/{subauth}/{term_id}",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type": "IriTemplateMapping",
          "variable": "term_id",
          "property": "hydra:freetextQuery",
          "required": true
        },
        {
          "@type": "IriTemplateMapping",
          "variable": "subauth",
          "property": "hydra:freetextQuery",
          "required": false,
          "default": "names"
        }
      ]
    },
    "qa_replacement_patterns": {
      "term_id": "term_id",
      "subauth": "subauth"
    },
    "term_id": "ID",
    ...
  }
}

NOTE: That the template URL can have any number of additional variable mappings based on the needs of the external authority. Each variable mapping can have a default value that will be used if the variable is not passed in. Additional parameters that always have the same value can be hardcoded into the Template URL.

Defining the normalization process of result data from fetching a term

The remaining parameters determine how the results are normalized. If QA request includes format=jsonld, the results will not be normalized. If no format is specified or format=json, the results will be normalized based on the "results" configuration.

In this part of the configuration, predicates are identified that play a common role in the UI. These predicates may be different across various ontologies, but are expected to be used in the same way when presented to a user in the UI. The predicate roles that are currently supported are...

  • id_predicate - if not specified, the subject_uri is used as the ID and the URI
  • label_predicate
  • altlabel_predicate
  • broader_predicate
  • narrower_predicate
  • sameas_predicate

Typical full example:

"results": {
  "id_predicate":       "http://id.loc.gov/vocabulary/identifiers/lccn",
  "label_predicate":    "http://www.w3.org/2004/02/skos/core#prefLabel",
  "altlabel_predicate": "http://www.w3.org/2004/02/skos/core#altLabel",
  "broader_predicate":  "http://www.w3.org/2004/02/skos/core#broader",
  "narrower_predicate": "http://www.w3.org/2004/02/skos/core#narrower",
  "sameas_predicate":   "http://www.w3.org/2004/02/skos/core#exactMatch"
}

Typical minimal example:

"results": {
  "id_predicate":       "http://purl.org/dc/terms/identifier",
  "label_predicate":    "http://www.w3.org/2004/02/skos/core#prefLabel",
  "altlabel_predicate": "http://www.w3.org/2004/02/skos/core#altLabel",
  "sameas_predicate":   "http://schema.org/sameAs"
}

From this, the results passed back from QA will look something like...

{
  "uri":"http://id.loc.gov/authorities/subjects/sh85076841",
  "id":"sh 85076841",
  "label":["Life sciences"],
  "altlabel":["Biosciences","Sciences, Life"],
  "narrower":["http://id.loc.gov/authorities/subjects/sh85083022","http://id.loc.gov/authorities/subjects/sh85002415",etc.],
  "broader":["http://id.loc.gov/authorities/subjects/sh00007934"],
  "sameas":[""],
  "predicates":{
    "http://www.loc.gov/mads/rdf/v1#hasCloseExternalAuthority":["http://id.worldcat.org/fast/998323","http://data.bnf.fr/ark:/12148/cb119716335",etc.],
    "http://www.loc.gov/mads/rdf/v1#isMemberOfMADSCollection":["http://id.loc.gov/authorities/subjects/collection_SubdivideGeographically","http://id.loc.gov/authorities/subjects/collection_LCSH_General",etc.],
    "http://www.loc.gov/mads/rdf/v1#isMemberOfMADSScheme":["http://id.loc.gov/authorities/subjects"],
    "http://www.w3.org/2008/05/skos-xl#altLabel":["Biosciences","Sciences, Life"],
    etc.}
}

Search specific configurations

If searching is not supported, use the following for this configuration...

"search": {}
Defining the access URL for searching

TODO: Add info about search URL

Defining the normalization process of results data from searching

TODO: Add info about search normalization

Full Example including all parts

Example configuration...

{
  "prefixes": {
    "madsrdf": "http://www.loc.gov/mads/rdf/v1#",
    "schema": "http://www.w3.org/2000/01/rdf-schema#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "loc": "http://id.loc.gov/vocabulary/identifiers/"
  },
  "term": {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type":    "IriTemplate",
      "template": "http://services.ld4l.org/ld4l_services/loc_genre_lookup.jsp?uri={?term_uri}",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type":    "IriTemplateMapping",
          "variable": "term_uri",
          "property": "hydra:freetextQuery",
          "required": true,
          "encode":   true
        }
      ]
    },
    "qa_replacement_patterns": {
      "term_id": "term_uri"
    },
    "term_id": "URI",
    "results": {
      "id_predicate": "http://id.loc.gov/vocabulary/identifiers/lccn",
      "label_predicate": "http://www.w3.org/2004/02/skos/core#prefLabel",
      "altlabel_predicate": "http://www.w3.org/2004/02/skos/core#altLabel",
      "broader_predicate":  "http://www.w3.org/2004/02/skos/core#broader",
      "narrower_predicate": "http://www.w3.org/2004/02/skos/core#narrower",
      "sameas_predicate": "http://www.w3.org/2004/02/skos/core#exactMatch"
    }
  },
  "search": {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type": "IriTemplate",
      "template": "http://services.ld4l.org/ld4l_services/loc_genre_batch.jsp?query={?query}&entity={?subauth}&maxRecords={?maxRecords}&lang={?lang}&context={?context}",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type": "IriTemplateMapping",
          "variable": "query",
          "property": "hydra:freetextQuery",
          "required": true
        },
        {
          "@type": "IriTemplateMapping",
          "variable": "subauth",
          "property": "hydra:freetextQuery",
          "required": false,
          "default": ""
        },
        {
          "@type": "IriTemplateMapping",
          "variable": "maxRecords",
          "property": "hydra:freetextQuery",
          "required": false,
          "default": "20"
        },
        {
          "@type": "IriTemplateMapping",
          "variable": "lang",
          "property": "hydra:freetextQuery",
          "required": false,
          "default": "en"
        },
        {
          "@type": "IriTemplateMapping",
          "variable": "context",
          "property": "hydra:freetextQuery",
          "required": false,
          "default": "false"
        }
      ]
    },
    "qa_replacement_patterns": {
      "query":   "query",
      "subauth": "subauth"
    },
    "results": {
      "id_predicate":       "http://id.loc.gov/vocabulary/identifiers/lccn",
      "label_predicate":    "http://www.loc.gov/mads/rdf/v1#authoritativeLabel",
      "sort_predicate":     "http://vivoweb.org/ontology/core#rank",
      "selector_predicate": "http://vivoweb.org/ontology/core#rank"
    },
    "context": {
      "groups": {
        "hierarchy": {
          "group_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.hierarchy",
          "group_label_default": "Hierarchy"
        }
      },
      "properties": [
        {
          "property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.authoritative_label",
          "property_label_default": "Authoritative Label",
          "ldpath": "madsrdf:authoritativeLabel :: xsd:string",
          "selectable": true,
          "drillable": false
        },
        {
          "property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.alt_label",
          "property_label_default": "Variant Label",
          "ldpath": "skos:altLabel :: xsd:string",
          "selectable": false,
          "drillable": false
        },
        {
          "group_id": "hierarchy",
          "property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.narrower",
          "property_label_default": "Narrower",
          "ldpath": "skos:narrower :: xsd:string",
          "selectable": true,
          "drillable": true,
          "expansion_label_ldpath": "skos:prefLabel ::xsd:string",
          "expansion_id_ldpath": "loc:lccn ::xsd:string"
        },
        {
          "group_id": "hierarchy",
          "property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.broader",
          "property_label_default": "Broader",
          "ldpath": "skos:broader :: xsd:string",
          "selectable": true,
          "drillable": true,
          "expansion_label_ldpath": "skos:prefLabel ::xsd:string",
          "expansion_id_ldpath": "loc:lccn ::xsd:string"
        },
        {
          "property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.exact_match",
          "property_label_default": "Exact Match",
          "ldpath": "skos:exactMatch :: xsd:string",
          "selectable": false,
          "drillable": false
        },
        {
          "property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.note",
          "property_label_default": "Note",
          "ldpath": "skos:note :: xsd:string",
          "selectable": false,
          "drillable": false
        }
      ]
    },
    "subauthorities": {
      "person":         "Person",
      "organization":   "Organization",
      "place":          "Place",
      "intangible":     "Intangible",
      "geocoordinates": "GeoCoordinates",
      "work":           "Work"
    }
  }
}

NOTES:

  • term: (optional) is used to define how to request term information from the authority and how to interpret results.

    • url: (required) templated link representation of the authority API URL and mapping of parameters for requesting term information from the authority
      • template: is the authority API URL with placeholders for substitution parameters in the form {?var_name}
        • NOTE: {?term_id} (required) and {?subauth} (optional) are expected to match to QA params (see qa_replacement_patterns to match QA params with mapping variables)
        • Additional substitutions can be made in the authority API if supported by the authority by adding additional mappings. Search has an example with maximumRecords.
          • variable: should match a replacement pattern in the template (e.g. variable: maximumRecords ==> {?maximumRecords}
          • required: true | false (NOTE: Not enforced at this time.)
          • default: provide a default value that will be used if not specified
        • See (documentation of templated-links)[http://www.hydra-cg.com/spec/latest/core/#templated-links] for more information.
    • qa_replacement_patterns: identifies which mapping variables are being used for term_id and subauth.
      • NOTE: The URL to make a term request via QA always uses term_id and subauth as the param names. qa_replacement_patters allows the url template to use a different variable name for pattern replacement.
    • language: (optional) values: array of en | fr | etc. -- identify language to use to include in results, filtering out triples of other languages
      • NOTE: Some authoritys' API URL allows language to be specified as a parameter. In that case, use pattern replacement to add the language to the API URL to prevent alternate languages from being returned in the results.
      • NOTE: At this writing, only label and altlabel are filtered.
    • term_id: (optional) values: ID (default) | URI - This tells apps whether __TERM_ID__ replacement is expecting an ID or URI.
    • results: (required) lists predicates to select out for normalization in the hash results
      • id_predicate: (optional)
      • label_predicate: (required)
      • altlabel_predicate: (optional)
      • sameas_predicate: (optional)
      • narrower_predicate: (optional)
      • broader_predicate: (optional)
    • subauthorities: (optional)
      • subauthority name (e.g. topic:, personal_name:, corporate_name, etc.) Value for {?subauth} are limited to the values in the list of subauthorities.
  • search: (optional) is used to define how to send a query to the authority and how to interpret results.

    • url: (required) templated link representation of the authority API URL and mapping of parameters for sending a query to the authority
      • template: is the authority API URL with placeholders for substitution parameters in the form {?var_name}
        • NOTE: {?query} (required) and {?subauth} (optional) are expected to match to QA params (see qa_replacement_patterns to match QA params with mapping variables)
        • Additional substitutions can be made in the authority API if supported by the authority by adding additional mappings. Search has an example with maximumRecords.
          • variable: should match a replacement pattern in the template (e.g. variable: maximumRecords ==> {?maximumRecords}
          • required: true | false (NOTE: Not enforced at this time.)
          • default: provide a default value that will be used if not specified
        • See (documentation of templated-links)[http://www.hydra-cg.com/spec/latest/core/#templated-links] for more information.
    • qa_replacement_patterns: identifies which mapping variables are being used for term_id and subauth.
      • NOTE: The URL to make a term request via QA always uses term_id and subauth as the param names. qa_replacement_patters allows the url template to use a different variable name for pattern replacement.
    • language: (optional) values: array of en | fr | etc. -- identify language to use to include in results, filtering out triples of other languages
      • NOTE: Some authoritys' API URL allows language to be specified as a parameter. In that case, use pattern replacement to add the language to the API URL to prevent alternate languages from being returned in the results.
      • NOTE: At this writing, only label and altlabel are filtered.
    • results: (required) lists predicates to normalize and include in json results
      • id_predicate: (optional)
      • label_predicate: (required)
      • altlabel_predicate: (optional)
    • subauthorities: (optional)
      • subauthority name (e.g. topic:, personal_name:, corporate_name, etc.) Value for {?subauth} are limited to the values in the list of subauthorities.
Add new configuration

You can add linked data authorities by adding configuration files to your rails app in Rails.root/config/authorities/linked_data/YOUR_AUTH.json

Modify existing configuration

To modify one of the QA supplied configurations, copy it to your app in Rails.root/config/authorities/linked_data/YOUR_AUTH.json. Make your modifications to the json configuration file in your app.


Change log and notes of differences from QA CONFIG VERSION 1.0 to 2.0

  • Addition of a QA_CONFIG_VERSION number in the configuration file. Some changes from 1.0 to 2.0 are not backward compatible. The original linked data configuration supported by QA releases prior to QA 4.0 did not include specification of a version number. Any config without a version number will be assumed to be version 1.0.
  • Addition of extended context configuration for searching optionally returns basic results + extended context for each result
  • Enhanced language processing for literals. Language processing can be turned off for a configuration to avoid processing language in situations where the data may no follow standards or inconsistently implements language tags on literals. The enhancements apply to all config versions.
  • Correct the processing of {?var} to translate to var=_value_ instead of just _value_. Add processing of {var} which translates to _value_. This is not backward compatible with 1.0 configs since the processing of {?var} has changed. The code does check the config version and uses the appropriate approach with no version specified assumed to be 1.0. 1.0 configs are deprecated and should be updated to 2.0.
Clone this wiki locally