Skip to content

driebit/mod_elasticsearch2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mod_elasticsearch2

This Zotonic module gives you more relevant search results by making resources searchable through
Elasticsearch.

Configuration

To configure the Elasticsearch host and port, edit your zotonic.config file:

[
    %% ...
    {elasticsearch2_host, <<"elasticsearch">>},  %% Defaults to 127.0.0.1
    {elasticsearch2_port, 9200},                 %% Defaults to 9200
    %% ...
].

Or in your site config:

  • mod_elasticsearch2.host
  • mod_elasticsearch2.port

Config keys:

  • mod_elasticsearch2.track_total_hits Elastic search normally doesn't count hits beyond 10K. To enable or disable counting the real total number of hits set this config. Defaults to true (track total hits).

  • mod_elasticsearch2.default_operator The default text operator for words in a query string. This is either AND or OR. Defaults to 'OR'.

  • mod_elasticsearch2.add_search_wildcards Automatically add wildcards to the words in a text search string. This rewrites "The quick fox" to ((the | the*) (quick | quick*) (fox | fox*)) | "the quick fox" Defaults to true (do add wildcards).

  • mod_elasticsearch2.log_scores Set to true to add an info level log messages with the document scores after each search. Useful for understanding search results. Defaults to false.

Elasticsearch security config

Out of the box Elastic is configured to use TLS and passwords. Both are not (yet) supported by this library, and also don't have any purpose on localhost.

The following change must be made to config/elasticsearch.yml:

# Disable security features
xpack.security.enabled: false
xpack.security.enrollment.enabled: false

# Disable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
xpack.security.http.ssl:
  enabled: false
  keystore.path: certs/http.p12

Search queries

When mod_elasticsearch is enabled, it will direct all search operations of the ‘query’ type to elasticsearch2:

z_search:search({query, Args}, Context).

For Args, you can pass all regular Zotonic query arguments, such as:

z_search:search({query, [{hasobject, 507}]}, Context).

Query context filters

The filter search argument that you know from Zotonic will be used in Elasticsearch’s filter context. To add filters that influence score (ranking), use the query_context_filter instead. The syntax is identical to that of filter:

z_search:search({query, [{query_context_filter, [["some_field", "value"]]}]}, Context).

Extra query arguments

This module adds some extra query arguments on top of Zotonic’s default ones.

To find documents that have a field, whatever its value (make sure to pass exists as atom):

{filter, [<<"some_field">>, exists]}

To find documents that do not have a field (make sure to pass missing as atom):

{filter, [<<"some_field">>, missing]}

For a match phrase prefix query, use the prefix argument:

z_search:search({query, [{prefix, <<"Match this pref">>}]}, Context).

To exclude a document:

{exclude_document, [Type, Id]}

To supply a custom function_score clause, supply one or more score_functions. For instance, to rank recent articles above older ones:

z_search:search(
    {query, [
        {text, "Search this"},
        {score_function, #{
            <<"filter">> => [{cat, "article"}],
            <<"exp">> => #{
                <<"publication_start">> => #{
                    <<"scale">> => <<"30d">>
                }
            }
        }}
    ]},
    Context
).

Wildcards

The text query term is modified to search for prefix strings by appending * operators to the words in the query string.

This is not done if either:

  • the search text contains simple query string operators, especially the "; or
  • the config mod_elasticsearch2.no_automatic_wildcard is set to a true-ish value.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html for the query string operators.

Buffered put/delete

Use the mod_elasticsearch2:put_doc and mod_elasticsearch2:delete_doc routines to perform bulk put and delete requests.

The requests are buffered for one second, or 500 commands, whatever comes first.

After a command has been executed the following notification is emitted:

-record(elasticsearch_bulk_result, {
    action :: put | delete,
    doc_id :: binary(),
    index :: binary(),
    result :: binary(),
    status :: 100..500,
    error :: map() | undefined
}).

Notifications

elasticsearch_fields

Observe this foldr notification to change the document fields that are queried. You can use Elasticsearch multi_match syntax for boosting fields:

%% your_site.erl

-export([
    % ...
    observe_elasticsearch_fields/3
]).

observe_elasticsearch_fields({elasticsearch_fields, QueryText}, Fields, Context) ->
    %% QueryText is the search query text

    %% Add or remove fields: 
    [<<"some_field">>, <<"boosted_field^2">>|Fields].   

elasticsearch_put

Observe this notification to change the resource properties before they are stored in Elasticsearch. For instance, to store their zodiac sign alongside person resources:

%% your_site.erl

-include_lib("mod_elasticsearch2/include/elasticsearch.hrl").

-export([
    % ...
    observe_elasticsearch_put/3
]).

-spec observe_elasticsearch_put(#elasticsearch_put{}, map(), z:context()) -> map().
observe_elasticsearch_put(#elasticsearch_put{ index = _, type = <<"resource">>, id = Id }, Data, Context) ->
    case m_rsc:is_a(Id, person, Context) of
        true ->
            Data#{ zodiac => calculate_zodiac(Id, Context) };
        false ->
            Data
    end;
observe_elasticsearch_put(#elasticsearch_put{}, Data, Context) ->
    Data.

Type

In Elastic 5.x a document was associated with a type.

This type has been removed in Elastic 7+.

In this library we still use the type, it is stored as es_type and it set to resource for all Zotonic resources.

On fetch of a document record the _source.es_type is copied to _type. This for compatibility with software written for Elastic 5.x.

Likewise references to _type in queries are mapped to es_type.

As the document types are often used to distinguish ids between sources (for example Adlib databases) there are routines to combine the type and id:

DocId = mod_elasticsearch2:typed_id(Id, Type),
{Id, Type} = mod_elasticsearch2:type_id_split(DocId)

The empty type and the resource type are not appended to the document id.

Logging

By default, mod_elasticsearch2 logs outgoing queries at the debug log level. To see them in your Zotonic console, change the minimum log level to debug:

lager:set_loglevel(lager_console_backend, debug).

How resources are indexed

Content in all languages is stored in the index, following the one language per field strategy:

Each translation is stored in a separate field, which is analyzed according to the language it contains. At query time, the user’s language is used to boost fields in that particular language.

Development and low disk space

If you happen to be low on disk space then Elastic will become read only.

To disable this, especially on development machines, perform the following two commands:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_cluster/settings -d '{ "transient": { "cluster.routing.allocation.disk.threshold_enabled": false } }'
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published