Skip to content

Language processing in Linked Data authorities

E. Lynette Rayle edited this page Apr 19, 2019 · 13 revisions

Table of Contents

Overview

This document describes language processing for the linked data module in QA.

Reference: w3school's HTML Language Code Reference

Limitations

Language processing requires either the authority to support language filtering or requires the authority results to have language tagged literals. If neither of these conditions exist, filtering will not be applied to results.

Why perform language filtering?

Some linked data authorites tag literals with a language (e.g. 'milk@en', 'Milch@de', 'Lait@fr'). When an authority has literals in multiple languages, it is desirable to be able to request literals for a specific language for two reasons:

  1. provide users terms in their desired language
  2. avoid long results that include the term in multiple languages

Where can language be specified?

Language can be specified in multiple places. They are listed here in priority order with highest priority first. If the language is not specified at a higher priority location, then the next highest language specification that does exist will be used.

  1. Passed as part of the request URL using the lang parameter
  2. Specified in the request header using HTTP_ACCEPT_LANGUAGE
  3. Authority specific default defined in the authority configuration
  4. Site wide default defined in qa initializer

See Configuring and using language processing below for more information on how to setup and use language filtering.

How is language filtering applied?

Filtering can happen in two ways either with the authority performing the filtering or QA performing the filtering.

Authority filtering

If the authority's API supports passing in a language parameter, then QA will pass the language to the authority for it to perform the filtering. Passing language as a parameter to an authority is limited to a single language (e.g. en). If multiple languages are specified, then only the first language will be passed to the authority. (e.g. for [en, fr], only en will be passed)

Post authority query filtering

QA requests the full set of results that the authority will return. Then QA performs filtering on the full set of results based on the selected language. QA filtering supports filtering for multiple languages (e.g. [:en, :fr]).

NOTE: Some authorities will filter to a default language regardless of what QA requests. In that case, QA filtering will have no effect on the results.

Rules for filtering:

  • if a language is not specified, keep all triples
  • keep triples where the object literal is tagged with the selected language
  • keep triples where the object literal doe not have a language tag
  • if there are 0 matches for a predicate, keep triples for all languages

Preventing language processing for a single authority when there is a site wide default

Some authorities may have language tagged literals that are known to be incorrect or you may actually want to retrieve literals for all languages. To prevent language filtering, set the language to * which acts as a wild card indicating all languages should be matched.

The most common usage of this is to set the authority default configuration to * to prevent filtering for that authority.

Setting the site default to * means that the default behavior is to not filter for any authority unless it is set individually in the authority or as part of the QA request.

A user can override the authority default and site default by passing in * to prevent filtering for a specific request.

Caveat: If the language is passed as a parameter to the authority and a default value is set for the language parameter, the default for the parameter will be used if the user passes in * for the language.

Configuring and using language processing

lang parameter as part of the QA request URL

Processing

The QA API supports passing lang= parameter on search and fetch requests. If passed in, it will be used as the language for filtering, ignoring all other language configurations.

Configuration

No configuration required.

USAGE

The following is an example QA request passing language as part of the URL.

curl 'http://localhost:3000/qa/search/linked_data/agrovoc_ld4l_cache?q=lait&lang=fr'

Language in the HTTP_ACCEPT_LANGUAGE request header

Processing

The QA API supports passing the language code as the HTTP_ACCEPT_LANGUAGE in the request header for a QA search or fetch request. If set in the request header, it will be used as the language for filtering unless the user included a lang= parameter on the request URL.

Configuration

No configuration required.

USAGE

The following is an example QA request passing language as part of the http header.

curl -H 'Accept-Language: fr' 'http://localhost:3000/qa/search/linked_data/agrovoc_ld4l_cache?q=lait'

Authority specific default

Processing

If the language is not passed in through a parameter or the request header, QA will look to see if the authority has a default value to use for the language.

Configuration

{
  "term": {
    ...
    "language": "en",
    ...
  },
  "search": {
    ...
    "language": ["en", "fr"],
    ...
  }
}

USAGE

The following is an example QA request which does not pass in language. The language will be set to the default language configured for oclc_fast authority if it is defined in the oclc_fast search configuration; otherwise, it will use the site wide default language.

curl 'http://localhost:3000/qa/search/linked_data/ocld_fast?q=twain'

Site wide default

Processing

If the language is not passed in through any other means, QA will look to see if there is a site wide default value to use for language.

Configuration

NOTE: This provides examples for configuring a parameter to pass to the authority for the authority to perform the filtering. If this is not an option for the authority, do not provide this configuration and the filtering will happen on the QA side provided the results from the authority include language tagged literals.

Installing qa initializer file...

The site wide language default is configured in the qa initializer. When the qa:install generator is run, the qa initializer is installed into /config/initializers/qa.rb. The generator will also modify routes and perform other actions. If this is a new installation of qa, you can run the installer using...

$ rails generate qa:install

OR you can manually copy the qa intializer from /lib/generators/qa/install/templates/config/initializers/qa.rb to /config/intializers/qa.rb.

Configuring site wide default

Edit /config/intializers/qa.rb and modify the value for default_language (uncommenting if needed)...

config.default_language = :en

USAGE

The following is an example QA request which does not pass in language. The language will be set to the default language configured for oclc_fast authority if it is defined in the oclc_fast search configuration; otherwise, it will use the site wide default language.

curl 'http://localhost:3000/qa/search/linked_data/ocld_fast?q=twain'

Configuring a language parameter to send language to an authority

Processing

If the configuration defines a language parameter for the authority's search url or the authority's term url, the language value will be passed to the authority which will perform the language filtering. If not defined, the filtering occurs on the QA side with QA filtering results returned from the authority.

Determination of which language to use as the value of the language parameter is determined by the prioritization process for language described in Where can language be specified?. See the other sections in Configuring and using language processing for more details on configurations and determining which language will be used.

Limitations

Requires the authority to support language filtering.

Configuration

Configure parameter to pass to authority for fetching a single term

Configure a parameter to pass to the authority when fetching a single term. You see lang defined in the "template". And there is a mapping for the lang parameter.

{
  "term": {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type":    "IriTemplate",
      "template": "http://api.library.cornell.edu/skosmos/rest/v1/nalt/data?{?lang}&uri={term_uri}",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type":    "IriTemplateMapping",
          "variable": "term_uri",
          "property": "hydra:freetextQuery",
          "required": true,
          "encode":   false
        },
        {
          "@type":    "IriTemplateMapping",
          "variable": "lang",
          "property": "hydra:freetextQuery",
          "required": false
        }
      ]
    },
    ...
  },
  ...
}

Identify the parameter used by the authority for language. Many authorities support the commonly used lang parameter, but QA does not assume this. It allows you to specify a different parameter to use in the authority's URL.

NOTE: The key in this hash is always "lang". The value for "lang" identifies the name of the parameter in the authority URL.

{
  "term": {
    ...
    "qa_replacement_patterns": {
      "term_id": "term_uri",
      "lang": "lang"
    },
    ...
  },
  ...
}
Configure parameter to pass to authority for searching for terms

Similarly, you can define a parameter to use for the search template URL. Again, you see lang defined in the template and a mapping for the lang parameter.

{
  ...
  "search": {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type":    "IriTemplate",
      "template": "http://services.ld4l.org/ld4l_services/agrovoc_batch.jsp?{?query}&{?maxRecords}&{?lang}",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type":    "IriTemplateMapping",
          "variable": "query",
          "property": "hydra:freetextQuery",
          "required": true
        },
        {
          "@type":    "IriTemplateMapping",
          "variable": "maxRecords",
          "property": "hydra:freetextQuery",
          "required": false,
          "default":  "20"
        },
        {
          "@type":    "IriTemplateMapping",
          "variable": "lang",
          "property": "hydra:freetextQuery",
          "required": false
        }
      ]
    },
  ...
}

Also in the same was as for term fetch, you can identify the parameter used by the authority for language.

{
    "qa_replacement_patterns": {
      "query":   "query",
      "lang":    "lang"
    },
    ...
  }
}

USAGE

All previous examples of QA requests work for passing language to the authority for filtering. The key requirement to use authority filtering is that the authority supports a language parameter which can be used to pass language as part of the request to the authority.

Clone this wiki locally