inveniosoftware · fenekku · Sep 26, 2024 · Aug 22, 2024 · Aug 27, 2024 · Aug 6, 2024
diff --git a/docs/customize/vocabularies/names.md b/docs/customize/vocabularies/names.md
@@ -120,3 +120,81 @@ invenio vocabularies import \
   --vocabulary names \
   --origin /path/to/ORCID_2021_10_summaries.tar.gz
 ```
+
+### Using ORCiD Public Data Sync
+
+*Introduced in InvenioRDM v13*
+
+#### Installing Required Dependencies
+
+First, you should install the required `s3fs` dependency. This can be achieved by adding the following to the `Pipfile` in your instance:
+
+```toml
+[packages]
+...
+invenio-vocabularies = {extras = ["s3fs"]}
+...
+```
+
+#### Configuring ORCiD Public Data Sync
+
+InvenioRDM supports loading names using the ORCiD Public Data Sync. To set this up, you need to create a definition file named `names-orcid.yaml` with the following content:
+
+```yaml
+names:
+  readers:
+    - type: orcid-data-sync
+    - type: xml
+  transformers:
+    - type: orcid
+  writers:
+    - type: async
+      args:
+        writer:
+          type: names-service
+  batch_size: 1000
+  write_many: true
+```
+
+#### Customizing the Sync Interval
+
+Optionally, you can specify the sync interval for the orcid-data-sync reader by adding arguments. If not specified, the default sync interval is one day. The supported arguments for defining the interval are:
+
+	•	years
+	•	months
+	•	weeks
+	•	days
+	•	hours
+	•	minutes
+	•	seconds
+	•	microseconds
+
+Here is an example of how to set a custom sync interval of 10 days:
+
+```yaml
+names:
+  readers:
+    - type: orcid-data-sync
+      args:
+        since: 
+          days: 10
+    - type: xml
+  transformers:
+    - type: orcid
+  writers:
+    - type: async
+      args:
+        writer:
+          type: names-service
+  batch_size: 1000
+  write_many: true
+```
+#### Running the Import Command
+
+To run an import using the names-orcid.yaml file, use the vocabularies import command as shown below:
+
+```shell
+invenio vocabularies import \
+  --vocabulary names \
+  --filepath ./names-orcid.yaml
+```
diff --git a/docs/reference/search.md b/docs/reference/search.md
@@ -0,0 +1,102 @@
+# Searching in InvenioRDM
+_Introduced in InvenioRDM v13_
+
+### InvenioRDM Suggest API
+
+The suggest API endpoint (`/api/{resource}?suggest={search_input}`) provides an interface for real-time search suggestions. It leverages OpenSearch's `multi_match` query to search across multiple fields within a specified index, returning relevant suggestions based on user input.
+
+#### Endpoint Structure
+
+**URL:** `/api/{resource}?suggest={search_input}`
+**Method:** GET
+
+Each index in InvenioRDM can have its own configuration to customize how the suggest API behaves. This includes defining which fields are searchable and other settings provided by the `multi_match` query API.
+
+## How to use suggest API?
+
+InvenioRDM's Suggest API is designed to provide search suggestions by using a `multi_match` query. It can be configured for all the indices using the `SuggestQueryParser` class that can be imported from `invenio-records-resources` module. The fields are analyzed using custom analyzers at index time which apply filters like `asciifolding` for accent search and `edge_ngram` for prefix search.
+
+Check the [official documentation](https://opensearch.org/docs/2.0/opensearch/ux/) and the [reference](#reference) section below for more context on the `edge_ngram` filter and custom analyzers.
+
+### When to Use the Suggest API
+
+- **Typo Tolerance & Auto-completion:** Helps correct typos (using `fuzziness` at search time analyzing) and completes partial inputs.
+- **Large, Diverse Datasets:** Useful for datasets with a wide variety of terms, like names or titles.
+- **Pre-query Optimization:** Reduces unnecessary searches by suggesting relevant terms.
+
+### When Not to Use the Suggest API
+
+- **Small or Specific Datasets:** Less beneficial for well-defined datasets.
+- **Performance Constraints:** Because the suggest API creates large amounts of tokens using the `edge_ngram` filter, it is important to observe how it affects the index size.
+  - A reasonable trade-off might involve an index size increase of up to 20-30% if it significantly improves search speed and relevance.
+  - A 10-20% improvement in response times might justify a moderate increase in index size.
+
+For more information check the [official documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html).
+
+## Key Considerations for Customizing Index Mappings
+
+### Size
+
+- Field Type Selection: Use lightweight field types (e.g., keyword over text where appropriate) to minimize index size.
+- Usage of custom analyzers or filters should be as limited as possible to prevent index bloating.
+
+### Speed
+
+- Search Performance: Keeping size in mind, apply custom analyzers that include `edge_ngram` filter to provide quick suggestions, and optimize for frequently queried fields to enhance search speed.
+- Analyzer and filter selection: Configure only when necessary to improve search time.
+
+## Fine tuning the search
+
+Boosting affects the relevance score of documents. A higher boost value means a stronger influence on the search ranking. Determine which fields are most critical for your search relevance (e.g., titles, authors, keywords).
+
+- **Relevance Adjustment:** Boosting of field(s) can be done by using the caret operator **(^)** followed by a number. For example:
+    * `name^100` will boost the name field by a factor of 100.
+    * Asterisk **(\*)** can be used to apply boosting to all the subfields. `i18n_titles.*^50`
+
+- **Balance and Tuning:** Use boosting judiciously to avoid skewing results too heavily towards particular fields. Assign boost factors based on the importance of each field. Higher values increase the influence of matches in that field.
+
+Tuning search where multiple fields are searched upon is essential to make sure that relevant results are always returned first. Taking the affiliations index as an example, the key fields are `name`, `acronym` and `title.{subfields}`. Since affiliations are usually searched by name, name is given more weight to boost its relevance.
+
+```
+"name^80", "acronym^40", "title.*^20"
+```
+
+## Reference
+
+### Analyzers
+
+Analyzers allow for searches to match documents which are not exact matches. For example, matching different cases, without accents, parts of words, mis-spellings, etc. Fundamentally all analyzers must contain one tokenizer. A tokenizer essentially splits an input search into parts. Additionally an analyzer may optionally have one or many character filters and/or token filters.
+
+- A **character filter** is applied first and takes the entire input and adds, removes or changes any characters depending on our needs.
+- The [**tokenizer**](https://opensearch.org/docs/latest/analyzers/tokenizers/index/) then splits the input into parts (words)
+- Finally the [**token filter**](https://opensearch.org/docs/latest/analyzers/token-filters/index/) acts similarly to a tokenizer, but is applied to each input "word"
+
+Read more about analyzers on [the OpenSearch official docs](https://opensearch.org/docs/latest/analyzers/).
+
+Analyzers can be applied to both the search input and when the document is indexed. In most cases we want to apply the same analyzer to the search input and during indexing so that there is not unexpected behaviour.
+
+- [**Normalizers**](https://opensearch.org/docs/latest/analyzers/normalizers/) — Simpler and mainly used to improve the matching of keyword search. The `keyword` type is the simplest way in which data can be stored and by default works as an exact match search. Using a normalizer you can add, remove and alter the input into exactly one other token which is stored and searched for.
+
+### Character filters
+
+Character filters take the stream of characters before tokenization and can add, remove or replace characters according to the rules and type of filter. There are 3 types - mapping filter, pattern replace filter and HTML stripping filter.
+
+For our indices in InvenioRDM, we are currently using a custom pattern replace filter that uses regex to remove special characters!
+
+```
+"char_filter": {
+  "strip_special_chars": {
+    "type": "pattern_replace",
+    "pattern": "[\\p{Punct}\\p{S}]",
+    "replacement": ""
+  }
+}
+```
+
+### Tokenizers and token filters
+
+We are using the following tokenizers and token filters in some of our indices in InvenioRDM:
+
+- **[ngram and edge_ngram](https://opensearch.org/docs/latest/analyzers/tokenizers/index/#partial-word-tokenizers)** — Both of these are used for matching parts of words, n-gram creates n sized chunks ("car" ngram(1,2) -> "c", "ca", "a", "ar") and edge_ngram creates chunks from the beginning of the word ("dog" edge_ngram(1,3) -> "d", "do", "dog"). Edge N-gram enables prefix searching and is preferred as it produces less tokens. Additionally it is recommended that these are used as token filters so that they produce tokens on each word rather than between words.
+- **[uax_url_email](https://opensearch.org/docs/latest/analyzers/tokenizers/index/#word-tokenizers)** — If it is likely that searches and/or documents will contain URLs or emails, it is better to use this tokenizer. If a standard tokenizer is used the URL/email will be split on the special characters which results in behaviour which may be unexpected (searching tim@apple.com will return documents with apple in them for example)
+- **[asciifolding](https://opensearch.org/docs/latest/analyzers/token-filters/index/)** — Allows characters to match with many different representations, especially relevant for non-English languages. For example ä -> a, Å -> A, etc.
diff --git a/docs/releases/upgrading/upgrade-v13.0.md b/docs/releases/upgrading/upgrade-v13.0.md
@@ -0,0 +1,160 @@
+# Upgrading from v12 to v13.0
+
+!!! warning "THIS RECIPE IS A WORK IN PROGRESS"
+
+## Prerequisites
+
+The steps listed in this article require an existing local installation of InvenioRDM v12.
+
+!!! warning "Backup"
+
+    Always backup your database and files before you try to perform an upgrade.
+
+!!! info "Older Versions"
+
+    In case you have an InvenioRDM installation older than v12, you can gradually upgrade
+    to v12 and afterwards continue from here.
+
+## Upgrade Steps
+
+Make sure you have the latest `invenio-cli` installed. For InvenioRDM v13, it
+should be v1.5.0+
+
+```bash
+$ invenio-cli --version
+invenio-cli, version 1.5.0
+```
+
+!!! info "Virtual environments"
+
+    In case you are not inside a virtual environment, make sure that you prefix each `invenio`
+    command with `pipenv run`.
+
+**Local development**
+
+Changing the Python version in your development environment highly
+depends on your setup, so we won't cover it here.
+One way would be to use [PyEnv](https://github.com/pyenv/pyenv).
+
+!!! warning "Risk of losing data"
+
+    Your virtual environment folder a.k.a., `venv` folder, may contain uploaded files. If you kept the default
+    location, it is in `<venv folder>/var/instance/data`. If you need to keep those files,
+    make sure you copy them over to the new `venv` folder in the same location.
+    The command `invenio files location list` shows the file upload location.
+
+If you upgraded your python version, you should recreate your virtual environment before
+running `invenio-cli` or `pipenv` commands below.
+
+
+### Upgrade InvenioRDM
+
+Python 3.9 or 3.11 or 3.12 is required to run InvenioRDM v12.
+
+There are two options to upgrade your system:
+
+#### Upgrade option 1: In-place
+
+This approach upgrades the dependencies in place. Your virtual environment for the
+v11 version will be gone afterwards.
+
+```bash
+cd <my-site>
+
+# Upgrade to InvenioRDM v12
+invenio-cli packages update 13.0.0
+
+# Re-build assets
+invenio-cli assets build
+```
+
+#### Upgrade option 2: New virtual environment
+
+This approach will create a new virtual environment and leaves the v11 one as-is.
+If you are using a docker image on your production instance this will be the
+option you choose.
+
+##### Step 1
+- create a new virtual environment
+- activate your new virtual environment
+- install `invenio-cli` by `pip install invenio-cli`
+
+##### Step 2
+Update the file `<my-site>/Pipfile`.
+
+```diff
+[packages]
+---invenio-app-rdm = {extras = [...], version = "~=12.0.0"}
++++invenio-app-rdm = {extras = [...], version = "~=13.0.0"}
+```
+
+##### Step 3
+Update the `Pipfile.lock` file:
+
+```bash
+invenio-cli packages lock
+```
+
+##### Step 4
+Install InvenioRDM v13:
+
+```bash
+invenio-cli install
+```
+
+### Database migration
+
+Execute the database migration:
+
+```bash
+invenio alembic upgrade
+```
+
+### Data migration
+
+
+Execute the data migration:
+
+### TODO
+
+
+### Rebuild search indices
+
+```bash
+invenio index destroy --yes-i-know
+invenio index init
+invenio rdm rebuild-all-indices
+```
+
+From v12 onwards, record statistics will be stored in search indices rather than the
+database. These indices are created through some *index templates* machinery
+rather than having indices registered directly in `Invenio-Search`. As such, the
+search indices for statistics are not affected by `invenio index destroy
+--yes-i-know` and are totally functional after the rebuild step.
+
+### New roles
+
+### TODO
+
+### New configuration variables
+
+```bash
+from invenio_app_rdm import __version__
+ADMINISTRATION_DISPLAY_VERSIONS = [
+    ("invenio-app-rdm", f"v{__version__}"),
+    ("{{ cookiecutter.project_shortname }}", "v1.0.0"),
+]
+```
+
+## Big Changes
+
+- feature: invenio jobs module, periodic tasks administration panel
+- feature: invenio vocabularies entries deprecation
+- improvement: search mappings and analyzers to improve performance
+
+### TODO
+
+## OPEN PROBLEMS
+
+
+### TODO
diff --git a/docs/releases/version-v13.0.0.md b/docs/releases/version-v13.0.0.md
@@ -0,0 +1,3 @@
+# InvenioRDM v13.0
+
+Draft
diff --git a/docs/releases/versions/version-v10.0.0.md b/docs/releases/versions/version-v10.0.0.md
@@ -18,7 +18,7 @@ In addition to the many bugs fixed, this release introduces custom fields both f
 
 ### Custom Fields
 
-You can now add custom fields to [bibliographic records](https://inveniordm.docs.cern.ch/customize/metadata/custom_fields/records/) and [communities](https://inveniordm.docs.cern.ch/customize/metadata/custom_fields/communities/) data models. InvenioRDM supports a wide variety of field types and UI widgets: you can find the full list in the [custom fields](https://inveniordm.docs.cern.ch/customize/custom_fields/records/#reference) and the [UI widgets](https://inveniordm.docs.cern.ch/reference/widgets/) documentation pages.
+You can now add custom fields to [bibliographic records](../../customize/metadata/custom_fields/records.md) and [communities](../../customize/metadata/custom_fields/communities.md) data models. InvenioRDM supports a wide variety of field types and UI widgets: you can find the full list in the [custom fields](../../customize/metadata/custom_fields/records.md#reference) and the [UI widgets](../../reference/custom_fields/widgets.md) documentation pages.
 
 You can also extend the default components or implement your owns. To get more information, refer to the [custom fields development section](../../develop/howtos/custom_fields.md) in the documentation.