diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index bbf3b8d035..21b6fbfea6 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -7,6 +7,9 @@ _List any issues this PR will resolve, e.g. Closes [...]._ ### Version _List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._ +### Frontend features +_If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional._ + ### Checklist - [ ] By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the [Developers Certificate of Origin](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). diff --git a/.github/workflows/pr-checklist.yml b/.github/workflows/pr-checklist.yml new file mode 100644 index 0000000000..83705f9efa --- /dev/null +++ b/.github/workflows/pr-checklist.yml @@ -0,0 +1,30 @@ +name: PR Checklist + +on: + pull_request: + types: [opened] + +jobs: + add-checklist-and-assignees: + runs-on: ubuntu-latest + + steps: + - name: Comment PR with checklist + uses: peter-evans/create-or-update-comment@v3 + with: + token: ${{ secrets.GITHUB_TOKEN }} + issue-number: ${{ github.event.pull_request.number }} + body: | + Thank you for submitting your PR. The PR states are Tech review -> Doc review -> Editorial review. If you're a developer submitting documentation for a feature you implemented, have the documentation reviewed by your team. If you need a tech review, let us know. Here's a checklist of the PR progression: + + ### PR Checklist + - [x] Tech Review + - [ ] Doc Review + - [ ] Editorial Review + + - name: Add assignees to the PR + uses: peter-evans/create-or-update-comment@v3 + with: + token: ${{ secrets.GITHUB_TOKEN }} + issue-number: ${{ github.event.pull_request.number }} + assignees: ${{ github.actor }}, kolchfa-aws diff --git a/_about/version-history.md b/_about/version-history.md index 6a0938541a..0d6d844951 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -9,6 +9,7 @@ permalink: /version-history/ OpenSearch version | Release highlights | Release date :--- | :--- | :--- +[2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024 [2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024 [2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) | Makes agents and tools and the OpenSearch Assistant Toolkit generally available. Introduces vector quantization within OpenSearch. Adds LLM guardrails and hybrid search with aggregations. Adds the Bloom filter skipping index for Apache Spark data sources, I/O-based admission control, and the ability to add an alerting cluster that manages all alerting tasks. For a full list of release highlights, see the Release Notes. | 2 April 2024 [2.12.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.12.0.md) | Makes concurrent segment search and conversational search generally available. Provides an experimental OpenSearch Assistant Toolkit, including agents and tools, workflow automation, and OpenSearch Assistant for OpenSearch Dashboards UI. Adds a new match-only text field, query insights to monitor top N queries, and k-NN search on nested fields. For a full list of release highlights, see the Release Notes. | 20 February 2024 diff --git a/_api-reference/cat/index.md b/_api-reference/cat/index.md index 0ddaf1e0a7..7454a4cf39 100644 --- a/_api-reference/cat/index.md +++ b/_api-reference/cat/index.md @@ -24,17 +24,54 @@ GET _cat ``` {% include copy-curl.html %} +The response is an ASCII cat (`=^.^=`) and a list of operations: + +``` +=^.^= +/_cat/allocation +/_cat/segment_replication +/_cat/segment_replication/{index} +/_cat/shards +/_cat/shards/{index} +/_cat/cluster_manager +/_cat/nodes +/_cat/tasks +/_cat/indices +/_cat/indices/{index} +/_cat/segments +/_cat/segments/{index} +/_cat/count +/_cat/count/{index} +/_cat/recovery +/_cat/recovery/{index} +/_cat/health +/_cat/pending_tasks +/_cat/aliases +/_cat/aliases/{alias} +/_cat/thread_pool +/_cat/thread_pool/{thread_pools} +/_cat/plugins +/_cat/fielddata +/_cat/fielddata/{fields} +/_cat/nodeattrs +/_cat/repositories +/_cat/snapshots/{repository} +/_cat/templates +/_cat/pit_segments +/_cat/pit_segments/{pit_id} +``` + ## Optional query parameters -You can use the following query parameters with any CAT API to filter your results. +The root `_cat` API does not take any parameters, but individual APIs, such as `/_cat/nodes` accept the following query parameters. Parameter | Description :--- | :--- | `v` | Provides verbose output by adding headers to the columns. It also adds some formatting to help align each of the columns together. All examples in this section include the `v` parameter. `help` | Lists the default and other available headers for a given operation. `h` | Limits the output to specific headers. -`format` | Returns the result in JSON, YAML, or CBOR formats. -`sort` | Sorts the output by the specified columns. +`format` | The format in which to return the result. Valid values are `json`, `yaml`, `cbor`, and `smile`. +`s` | Sorts the output by the specified columns. ### Query parameter usage examples @@ -59,7 +96,6 @@ sample-alias1 sample-index-1 - - - - Without the verbose parameter, `v`, the response simply returns the alias names: ``` - .kibana .kibana_1 - - - - sample-alias1 sample-index-1 - - - - ``` @@ -72,6 +108,24 @@ To see all the available headers, use the `help` parameter: GET _cat/?help ``` +For example, to see the available headers for the CAT aliases operation, send the following request: + +```json +GET _cat/aliases?help +``` +{% include copy-curl.html %} + +The response contains the available headers: + +``` +alias | a | alias name +index | i,idx | index alias points to +filter | f,fi | filter +routing.index | ri,routingIndex | index routing +routing.search | rs,routingSearch | search routing +is_write_index | w,isWriteIndex | write index +``` + ### Get a subset of headers To limit the output to a subset of headers, use the `h` parameter: @@ -80,7 +134,71 @@ To limit the output to a subset of headers, use the `h` parameter: GET _cat/?h=,&v ``` +For example, to limit aliases to only the alias name and index, send the following request: + +```json +GET _cat/aliases?h=alias,index +``` +{% include copy-curl.html %} + +The response contains the requested information: + +``` +.kibana .kibana_1 +sample-alias1 sample-index-1 +``` + Typically, for any operation you can find out what headers are available using the `help` parameter, and then use the `h` parameter to limit the output to only the headers that you care about. +### Sort by a header + +To sort the output by a header, use the `s` parameter: + +```json +GET _cat/?s=, +``` + +For example, to sort aliases by alias and then index, send the following request: + +```json +GET _cat/aliases?s=i,a +``` +{% include copy-curl.html %} + +The response contains the requested information: + +``` +sample-alias2 sample-index-1 +sample-alias1 sample-index-2 +``` + +### Retrieve data in JSON format + +By default, CAT APIs return data in `text/plain` format. + +To retrieve data in JSON format, use the `format=json` parameter: + +```json +GET _cat/?format=json +``` + +For example, to retrieve aliases in JSON format, send the following request: + +```json +GET _cat/aliases?format=json +``` +{% include copy-curl.html %} + +The response contains data in JSON format: + +```json +[ + {"alias":".kibana","index":".kibana_1","filter":"-","routing.index":"-","routing.search":"-","is_write_index":"-"}, + {"alias":"sample-alias-1","index":"sample-index-1","filter":"-","routing.index":"-","routing.search":"-","is_write_index":"-"} +] +``` + +Other supported formats are [YAML](https://yaml.org/), [CBOR](https://cbor.io/), and [Smile](https://github.com/FasterXML/smile-format-specification). + If you use the Security plugin, make sure you have the appropriate permissions. {: .note } diff --git a/_api-reference/common-parameters.md b/_api-reference/common-parameters.md index 347d38a0de..5b536ad992 100644 --- a/_api-reference/common-parameters.md +++ b/_api-reference/common-parameters.md @@ -90,3 +90,37 @@ The following request specifies filters to limit the fields returned in the resp GET _search?filter_path=.*,- ``` + +## Units + +OpenSearch APIs support the following units. + +### Time units + +The following table lists all supported time units. + +Units | Specify as +:--- | :--- +Days | `d` +Hours | `h` +Minutes | `m` +Seconds | `s` +Milliseconds | `ms` +Microseconds | `micros` +Nanoseconds | `nanos` + +### Distance units + +The following table lists all supported distance units. + +Units | Specify as +:--- | :--- +Miles | `mi` or `miles` +Yards | `yd` or `yards` +Feet | `ft` or `feet` +Inches | `in` or `inch` +Kilometers | `km` or `kilometers` +Meters | `m` or `meters` +Centimeters | `cm` or `centimeters` +Millimeters | `mm` or `millimeters` +Nautical miles | `NM`, `nmi`, or `nauticalmiles` \ No newline at end of file diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 2ff4318d56..a020fc459d 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -124,7 +124,9 @@ All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If { "doc" : { "title": "World War Z" }, "doc_as_upsert": true } ``` - You can specify a script for more complex document updates: + You can specify a script for more complex document updates by defining the script with the `source` or `id` from a document: + + - Script ```json diff --git a/_api-reference/index-apis/refresh.md b/_api-reference/index-apis/refresh.md index b72a6c7470..4d75060087 100644 --- a/_api-reference/index-apis/refresh.md +++ b/_api-reference/index-apis/refresh.md @@ -45,7 +45,7 @@ The following table lists the available query parameters. All query parameters a | :--- | :--- | :--- | | `ignore_unavailable` | Boolean | When `false`, the request returns an error when it targets a missing or closed index. Default is `false`. | `allow_no_indices` | Boolean | When `false`, the Refresh Index API returns an error when a wildcard expression, index alias, or `_all` targets only closed or missing indexes, even when the request is made against open indexes. Default is `true`. | -| `expand_wildcard` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. +| `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. diff --git a/_api-reference/render-template.md b/_api-reference/render-template.md new file mode 100644 index 0000000000..16bada0290 --- /dev/null +++ b/_api-reference/render-template.md @@ -0,0 +1,114 @@ +--- +layout: default +title: Render Template +nav_order: 82 +--- + +# Render Template + +The Render Template API renders a [search template]({{site.url}}{{site.baseurl}}/search-plugins/search-template/) as a search query. + +## Paths and HTTP methods + +``` +GET /_render/template +POST /_render/template +GET /_render/template/ +POST /_render/template/ +``` + +## Path parameters + +The Render Template API supports the following optional path parameter. + +| Parameter | Type | Description | +| :--- | :--- | :--- | +| `id` | String | The ID of the search template to render. | + +## Request options + +The following options are supported in the request body of the Render Template API. + +| Parameter | Required | Type | Description | +| :--- | :--- | :--- | :--- | +| `id` | Conditional | String | The ID of the search template to render. Is not required if the ID is provided in the path or if an inline template is specified by the `source`. | +| `params` | No | Object | A list of key-value pairs that replace Mustache variables found in the search template. The key-value pairs must exist in the documents being searched. | +| `source` | Conditional | Object | An inline search template to render if a search template is not specified. Supports the same parameters as a [Search]({{site.url}}{{site.baseurl}}/api-reference/search/) API request and [Mustache](https://mustache.github.io/mustache.5.html) variables. | + +## Example request + +Both of the following request examples use the search template with the template ID `play_search_template`: + +```json +{ + "source": { + "query": { + "match": { + "play_name": "{{play_name}}" + } + } + }, + "params": { + "play_name": "Henry IV" + } +} +``` + +### Render template using template ID + +The following example request validates a search template with the ID `play_search_template`: + +```json +POST _render/template +{ + "id": "play_search_template", + "params": { + "play_name": "Henry IV" + } +} +``` +{% include copy.html %} + +### Render template using `_source` + +If you don't want to use a saved template, or want to test a template before saving, you can test a template with the `_source` parameter using [Mustache](https://mustache.github.io/mustache.5.html) variables, as shown in the following example: + +``` +{ + "source": { + "from": "{{from}}{{^from}}10{{/from}}", + "size": "{{size}}{{^size}}10{{/size}}", + "query": { + "match": { + "play_name": "{{play_name}}" + } + } + }, + "params": { + "play_name": "Henry IV" + } +} +``` +{% include copy.html %} + +## Example response + +OpenSearch responds with information about the template's output: + +```json +{ + "template_output": { + "from": "0", + "size": "10", + "query": { + "match": { + "play_name": "Henry IV" + } + } + } +} +``` + + + + diff --git a/_benchmark/index.md b/_benchmark/index.md index 1a71d57de9..6d343b908a 100644 --- a/_benchmark/index.md +++ b/_benchmark/index.md @@ -24,13 +24,12 @@ The following diagram visualizes how OpenSearch Benchmark works when run against ![Benchmark workflow]({{site.url}}{{site.baseurl}}/images/benchmark/osb-workflow.jpg). -The OpenSearch Benchmark documentation is split into five sections: +The OpenSearch Benchmark documentation is split into four sections: - [Quickstart]({{site.url}}{{site.baseurl}}/benchmark/quickstart/): Learn how to quickly run and install OpenSearch Benchmark. - [User guide]({{site.url}}{{site.baseurl}}/benchmark/user-guide/index/): Dive deep into how OpenSearch Benchmark can help you track the performance of your cluster. - [Tutorials]({{site.url}}{{site.baseurl}}/benchmark/tutorials/index/): Use step-by-step guides for more advanced benchmarking configurations and functionality. -- [Commands]({{site.url}}{{site.baseurl}}/benchmark/commands/index/): A detailed reference of commands and command options supported by OpenSearch. -- [Workloads]({{site.url}}{{site.baseurl}}/benchmark/workloads/index/): A detailed reference of options available for both default and custom workloads. +- [Reference]({{site.url}}{{site.baseurl}}/benchmark/reference/index/): A detailed reference of metrics, commands, telemetry devices, and workloads. diff --git a/_clients/python-low-level.md b/_clients/python-low-level.md index 894bef0e38..ba40fa3f45 100644 --- a/_clients/python-low-level.md +++ b/_clients/python-low-level.md @@ -8,9 +8,15 @@ redirect_from: # Low-level Python client -The OpenSearch low-level Python client (`opensearch-py`) provides wrapper methods for the OpenSearch REST API so that you can interact with your cluster more naturally in Python. Rather than sending raw HTTP requests to a given URL, you can create an OpenSearch client for your cluster and call the client's built-in functions. For the client's complete API documentation and additional examples, see the [`opensearch-py` API documentation](https://opensearch-project.github.io/opensearch-py/). +The OpenSearch low-level Python client (`opensearch-py`) provides wrapper methods for the OpenSearch REST API so that you can interact with your cluster more naturally in Python. Rather than sending raw HTTP requests to a given URL, you can create an OpenSearch client for your cluster and call the client's built-in functions. -This getting started guide illustrates how to connect to OpenSearch, index documents, and run queries. For the client source code, see the [`opensearch-py` repo](https://github.com/opensearch-project/opensearch-py). +This getting started guide illustrates how to connect to OpenSearch, index documents, and run queries. For additional information, see the following resources: +- [OpenSearch Python repo](https://github.com/opensearch-project/opensearch-py) +- [API reference](https://opensearch-project.github.io/opensearch-py/api-ref.html) +- [User guides](https://github.com/opensearch-project/opensearch-py/tree/main/guides) +- [Samples](https://github.com/opensearch-project/opensearch-py/tree/main/samples) + +If you have any questions or would like to contribute, you can [create an issue](https://github.com/opensearch-project/opensearch-py/issues) to interact with the OpenSearch Python team directly. ## Setup diff --git a/_config.yml b/_config.yml index e5cce4e34f..be015cec06 100644 --- a/_config.yml +++ b/_config.yml @@ -5,9 +5,9 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com permalink: /:path/ -opensearch_version: '2.14.0' -opensearch_dashboards_version: '2.14.0' -opensearch_major_minor_version: '2.14' +opensearch_version: '2.15.0' +opensearch_dashboards_version: '2.15.0' +opensearch_major_minor_version: '2.15' lucene_version: '9_10_0' # Build settings diff --git a/_dashboards/visualize/area.md b/_dashboards/visualize/area.md index 0f3b7863d3..5df59579ec 100644 --- a/_dashboards/visualize/area.md +++ b/_dashboards/visualize/area.md @@ -1,6 +1,6 @@ --- layout: default -title: Using area charts +title: Area charts parent: Building data visualizations nav_order: 5 --- diff --git a/_dashboards/visualize/gantt.md b/_dashboards/visualize/gantt.md index 875e35c127..3a9814465a 100644 --- a/_dashboards/visualize/gantt.md +++ b/_dashboards/visualize/gantt.md @@ -1,6 +1,6 @@ --- layout: default -title: Using Gantt charts +title: Gantt charts parent: Building data visualizations nav_order: 30 redirect_from: @@ -18,7 +18,7 @@ To create a Gantt chart, perform the following steps: 1. In the visualizations menu, choose **Create visualization** and **Gantt Chart**. 1. Choose a source for the chart (e.g. some log data). 1. Under **Metrics**, choose **Event**. For log data, each log is an event. -1. Select the **Start Time** and **Duration** fields from your data set. The start time is the timestamp for the beginning of an event. The duration is the amount of time to add to the start time. +1. Select the **Start Time** and **Duration** fields from your dataset. The start time is the timestamp for the beginning of an event. The duration is the amount of time to add to the start time. 1. Under **Results**, choose the number of events to display on the chart. Gantt charts sequence events from earliest to latest based on start time. 1. Choose **Panel settings** to adjust axis labels, time format, and colors. 1. Choose **Update**. diff --git a/_dashboards/visualize/geojson-regionmaps.md b/_dashboards/visualize/geojson-regionmaps.md index 663c4c2f39..aa006e0a24 100644 --- a/_dashboards/visualize/geojson-regionmaps.md +++ b/_dashboards/visualize/geojson-regionmaps.md @@ -1,6 +1,6 @@ --- layout: default -title: Using coordinate and region maps +title: Coordinate and region maps parent: Building data visualizations has_children: true nav_order: 15 @@ -12,7 +12,7 @@ redirect_from: OpenSearch has a standard set of GeoJSON files that provide a vector map with each region map. OpenSearch Dashboards also provides basic map tiles with a standard vector map to create region maps. You can configure the base map tiles using [Web Map Service (WMS)](https://www.ogc.org/standards/wms). For more information, see [Configuring WMS in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/maptiles/). -For air gapped environments, OpenSearch Dashboards provides a self-host maps server. For more information, see [Using the self-host maps server]({{site.url}}{{site.baseurl}}/dashboards/selfhost-maps-server/) +For air-gapped environments, OpenSearch Dashboards provides a self-host maps server. For more information, see [Using the self-host maps server]({{site.url}}{{site.baseurl}}/dashboards/selfhost-maps-server/). While you can't configure a server to support user-defined vector map layers, you can configure your own GeoJSON file and upload it for this purpose. {: .note} @@ -35,7 +35,7 @@ You can use [geojson.io](https://geojson.io/#map=2/20.0/0.0) to extract GeoJSON To create your own custom vector map, upload a JSON file that contains GEO data for your customized regional maps. The JSON file contains vector layers for visualization. -1. Prepare a JSON file to upload. Make sure the file has either a .geojson or .json extension. +1. Prepare a JSON file to upload. Make sure the file has either a `.geojson` or `.json` extension. 1. On the top menu bar, go to **OpenSearch Dashboards > Visualize**. 1. Select the **Create Visualization** button. 1. Select **Region Map**. diff --git a/_dashboards/visualize/maps-stats-api.md b/_dashboards/visualize/maps-stats-api.md index 7939a4e732..f81c7e6ac4 100644 --- a/_dashboards/visualize/maps-stats-api.md +++ b/_dashboards/visualize/maps-stats-api.md @@ -3,7 +3,7 @@ layout: default title: Maps Stats API nav_order: 20 grand_parent: Building data visualizations -parent: Using coordinate and region maps +parent: Coordinate and region maps has_children: false --- diff --git a/_dashboards/visualize/maps.md b/_dashboards/visualize/maps.md index 23e14d41c3..5728fd9092 100644 --- a/_dashboards/visualize/maps.md +++ b/_dashboards/visualize/maps.md @@ -2,7 +2,7 @@ layout: default title: Using maps grand_parent: Building data visualizations -parent: Using coordinate and region maps +parent: Coordinate and region maps nav_order: 10 redirect_from: - /dashboards/maps-plugin/ diff --git a/_dashboards/visualize/maptiles.md b/_dashboards/visualize/maptiles.md index 6b8cc06ef3..6c7afc7462 100644 --- a/_dashboards/visualize/maptiles.md +++ b/_dashboards/visualize/maptiles.md @@ -2,7 +2,7 @@ layout: default title: Configuring a Web Map Service (WMS) grand_parent: Building data visualizations -parent: Using coordinate and region maps +parent: Coordinate and region maps nav_order: 30 redirect_from: - /dashboards/maptiles/ diff --git a/_dashboards/visualize/selfhost-maps-server.md b/_dashboards/visualize/selfhost-maps-server.md index 925c5449fe..439f9b634a 100644 --- a/_dashboards/visualize/selfhost-maps-server.md +++ b/_dashboards/visualize/selfhost-maps-server.md @@ -1,14 +1,14 @@ --- layout: default -title: Using the self-host maps server +title: Using self-hosted map servers grand_parent: Building data visualizations -parent: Using coordinate and region maps +parent: Coordinate and region maps nav_order: 40 redirect_from: - /dashboards/selfhost-maps-server/ --- -# Using the self-host maps server +# Using self-hosted map servers The self-host maps server for OpenSearch Dashboards allows users to access the default maps service in air-gapped environments. OpenSearch-compatible map URLs include a map manifest with map tiles and vectors, the map tiles, and the map vectors. diff --git a/_dashboards/visualize/tsvb.md b/_dashboards/visualize/tsvb.md new file mode 100644 index 0000000000..d845dea58a --- /dev/null +++ b/_dashboards/visualize/tsvb.md @@ -0,0 +1,70 @@ +--- +layout: default +title: TSVB +parent: Building data visualizations +nav_order: 45 +--- + +# TSVB + +The Time-Series Visual Builder (TSVB) is a powerful data visualization tool in OpenSearch Dashboards that allows you to create detailed time-series visualizations. One of its key features is the ability to add annotations or markers at specific time points based on index data. This feature is particularly useful for making connections between multiple indexes and building visualizations that display data over time, such as flight status, delays by type, and more. TSVB currently supports the following visualization types: Area, Line, Metric, Gauge, Markdown, and Data Table. + +## Creating TSVB visualizations from multiple data sources +Introduced 2.14 +{: .label .label-purple } + +Before proceeding, ensure that the following configuration settings are enabled in the `config/opensearch_dasboards.yaml` file: + +```yaml +data_source.enabled: true +vis_type_timeseries.enabled: true +``` +{% include copy-curl.html %} + +Once you have configured [multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/) in OpenSearch Dashboards, you can use Vega to query those data sources. The following GIF shows the process of creating TSVB visualizations in OpenSearch Dashboards. + +![Process of creating TSVB visualizations in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/images/dashboards/configure-tsvb.gif) + +**Step 1: Set up and connect data sources** + +Open OpenSearch Dashboards and follow these steps: + +1. Select **Dashboards Management** from the main menu on the left. +2. Select **Data sources** and then select the **Create data source** button. +3. On the **Create data source** page, enter the connection details and endpoint URL. +4. On the **Home** page, select **Add sample data** and then select the **Add data** button for the **Sample web logs** dataset. + +The following GIF shows the steps required to set up and connect a data source. + +![Create data source]({{site.url}}{{site.baseurl}}/images/dashboards/create-datasource.gif) + +**Step 2: Create the visualization** + +Follow these steps to create the visualization: + +1. From the menu on the left, select **Visualize**. +2. On the **Visualizations** page, select **Create Visualization** and then select **TSVB** in the pop-up window. + +**Step 3: Specify data sources** + +After creating a TSVB visualization, data may appear based on your default index pattern. To change the index pattern or configure additional settings, follow these steps: + +1. In the **Create** window, select **Panel options**. +2. Under **Data source**, select the OpenSearch cluster from which to pull data. In this case, choose your newly created data source. +3. Under **Index name**, enter `opensearch_dashboards_sample_data_logs`. +4. Under **Time field**, select `@timestamp`. This setting specifies the time range for rendering the visualization. + +**(Optional) Step 4: Add annotations** + +Annotations are markers that can be added to time-series visualizations. Follow these steps to add annotations: + +1. On the upper-left corner of the page, select **Time Series**. +2. Select the **Annotations** tab and then **Add data source**. +3. In the **Index** name field, specify the appropriate index. In this case, continue using the same index from the previous steps, that is, `opensearch_dashboards_sample_data_logs`. +4. From **Time** field, select `@timestamp`. +5. In the **Fields** field, enter `timestamp`. +6. In the **Row template** field, enter `timestamp`. + +The visualization automatically updates to display your annotations, as shown in the following image. + + TSVB visualization with annotations diff --git a/_dashboards/visualize/vega.md b/_dashboards/visualize/vega.md index 7764d583a6..3a9f6aad4f 100644 --- a/_dashboards/visualize/vega.md +++ b/_dashboards/visualize/vega.md @@ -1,192 +1,137 @@ --- layout: default -title: Using Vega +title: Vega parent: Building data visualizations -nav_order: 45 +nav_order: 50 --- -# Using Vega +# Vega -[Vega](https://vega.github.io/vega/) and [Vega-Lite](https://vega.github.io/vega-lite/) are open-source, declarative language visualization tools that you can use to create custom data visualizations with your OpenSearch data and [Vega Data](https://vega.github.io/vega/docs/data/). These tools are ideal for advanced users comfortable with writing OpenSearch queries directly. Enable the `vis_type_vega` plugin in your `opensearch_dashboards.yml` file to write your [Vega specifications](https://vega.github.io/vega/docs/specification/) in either JSON or [HJSON](https://hjson.github.io/) format or to specify one or more OpenSearch queries within your Vega specification. By default, the plugin is set to `true`. The configuration is shown in the following example. For configuration details, refer to the `vis_type_vega` [README](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/src/plugins/vis_type_vega/README.md). +[Vega](https://vega.github.io/vega/) and [Vega-Lite](https://vega.github.io/vega-lite/) are open-source, declarative language visualization tools that you can use to create custom data visualizations with your OpenSearch data and [Vega data](https://vega.github.io/vega/docs/data/). These tools are ideal for advanced users comfortable with writing OpenSearch queries directly. Enable the `vis_type_vega` plugin in your `opensearch_dashboards.yml` file to write your [Vega specifications](https://vega.github.io/vega/docs/specification/) in either JSON or [HJSON](https://hjson.github.io/) format or to specify one or more OpenSearch queries in your Vega specification. By default, the plugin is set to `true`. + +## Creating Vega visualizations from multiple data sources +Introduced 2.13 +{: .label .label-purple } + +Before proceeding, ensure that the following configuration settings are enabled in the `config/opensearch_dasboards.yaml` file. For configuration details, refer to the `vis_type_vega` [README](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/src/plugins/vis_type_vega/README.md). ``` +data_source.enabled: true vis_type_vega.enabled: true ``` -The following image shows a custom Vega map created in OpenSearch. +After you have configured [multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/) in OpenSearch Dashboards, you can use Vega to query those data sources. The following GIF shows the process of creating Vega visualizations in OpenSearch Dashboards. -Map created using Vega visualization in OpenSearch Dashboards +![Process of creating Vega visualizations in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/images/dashboards/configure-vega.gif) -## Querying from multiple data sources +### Step 1: Set up and connect data sources -If you have configured [multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/) in OpenSearch Dashboards, you can use Vega to query those data sources. Within your Vega specification, add the `data_source_name` field under the `url` property to target a specific data source by name. By default, queries use data from the local cluster. You can assign individual `data_source_name` values to each OpenSearch query within your Vega specification. This allows you to query multiple indexes across different data sources in a single visualization. +Open OpenSearch Dashboards and follow these steps: -The following is an example Vega specification with `Demo US Cluster` as the specified `data_source_name`: +1. Select **Dashboards Management** from the menu on the left. +2. Select **Data sources** and then select the **Create data source** button. +3. On the **Create data source** page, enter the connection details and endpoint URL, as shown in the following GIF. +4. On the **Home page**, select **Add sample data**. Under **Data source**, select your newly created data source, and then select the **Add data button** for the **Sample web logs** dataset. -``` +The following GIF shows the steps required for setting up and connecting a data source. + +![Setting up and connecting data sources with OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/images/dashboards/Add_datasource.gif) + +### Step 2: Create the visualization + +1. From the menu on the left, select **Visualize**. +2. On the **Visualizations** page, select **Create Visualization** and then select **Vega** in the pop-up window. + +### Step 3: Add the Vega specification + +By default, queries use data from the local cluster. You can assign individual `data_source_name` values to each OpenSearch query in your Vega specification. This allows you to query multiple indexes across different data sources in a single visualization. + +1. Verify that the data source you created is specified under `data_source_name`. Alternatively, in your Vega specification, add the `data_source_name` field under the `url` property to target a specific data source by name. +2. Copy the following Vega specification and then select the **Update** button in the lower-right corner. The visualization should appear. + +```json { - $schema: https://vega.github.io/schema/vega/v5.json - config: { - kibana: {type: "map", latitude: 25, longitude: -70, zoom: 3} - } - data: [ - { - name: table - url: { - index: opensearch_dashboards_sample_data_flights - // This OpenSearchQuery will query from the Demo US Cluster datasource - data_source_name: Demo US Cluster - %context%: true - // Uncomment to enable time filtering - // %timefield%: timestamp - body: { - size: 0 - aggs: { - origins: { - terms: {field: "OriginAirportID", size: 10000} - aggs: { - originLocation: { - top_hits: { - size: 1 - _source: { - includes: ["OriginLocation", "Origin"] - } - } - } - distinations: { - terms: {field: "DestAirportID", size: 10000} - aggs: { - destLocation: { - top_hits: { - size: 1 - _source: { - includes: ["DestLocation"] - } - } - } - } + $schema: https://vega.github.io/schema/vega-lite/v5.json + data: { + url: { + %context%: true + %timefield%: @timestamp + index: opensearch_dashboards_sample_data_logs + data_source_name: YOUR_DATA_SOURCE_TITLE + body: { + aggs: { + 1: { + date_histogram: { + field: @timestamp + fixed_interval: 3h + time_zone: America/Los_Angeles + min_doc_count: 1 + } + aggs: { + 2: { + avg: { + field: bytes } } } } } + size: 0 } - format: {property: "aggregations.origins.buckets"} - transform: [ - { - type: geopoint - projection: projection - fields: [ - originLocation.hits.hits[0]._source.OriginLocation.lon - originLocation.hits.hits[0]._source.OriginLocation.lat - ] - } - ] } - { - name: selectedDatum - on: [ - {trigger: "!selected", remove: true} - {trigger: "selected", insert: "selected"} - ] + format: { + property: aggregations.1.buckets } - ] - signals: [ + } + transform: [ { - name: selected - value: null - on: [ - {events: "@airport:mouseover", update: "datum"} - {events: "@airport:mouseout", update: "null"} - ] + calculate: datum.key + as: timestamp } - ] - scales: [ { - name: airportSize - type: linear - domain: {data: "table", field: "doc_count"} - range: [ - {signal: "zoom*zoom*0.2+1"} - {signal: "zoom*zoom*10+1"} - ] + calculate: datum[2].value + as: bytes } ] - marks: [ + layer: [ { - type: group - from: { - facet: { - name: facetedDatum - data: selectedDatum - field: distinations.buckets - } + mark: { + type: line } - data: [ - { - name: facetDatumElems - source: facetedDatum - transform: [ - { - type: geopoint - projection: projection - fields: [ - destLocation.hits.hits[0]._source.DestLocation.lon - destLocation.hits.hits[0]._source.DestLocation.lat - ] - } - {type: "formula", expr: "{x:parent.x, y:parent.y}", as: "source"} - {type: "formula", expr: "{x:datum.x, y:datum.y}", as: "target"} - {type: "linkpath", shape: "diagonal"} - ] - } - ] - scales: [ - { - name: lineThickness - type: log - clamp: true - range: [1, 8] - } - { - name: lineOpacity - type: log - clamp: true - range: [0.2, 0.8] - } - ] - marks: [ - { - from: {data: "facetDatumElems"} - type: path - interactive: false - encode: { - update: { - path: {field: "path"} - stroke: {value: "black"} - strokeWidth: {scale: "lineThickness", field: "doc_count"} - strokeOpacity: {scale: "lineOpacity", field: "doc_count"} - } - } - } - ] } { - name: airport - type: symbol - from: {data: "table"} - encode: { - update: { - size: {scale: "airportSize", field: "doc_count"} - xc: {signal: "datum.x"} - yc: {signal: "datum.y"} - tooltip: { - signal: "{title: datum.originLocation.hits.hits[0]._source.Origin + ' (' + datum.key + ')', connnections: length(datum.distinations.buckets), flights: datum.doc_count}" - } - } + mark: { + type: circle + tooltip: true } } ] + encoding: { + x: { + field: timestamp + type: temporal + axis: { + title: @timestamp + } + } + y: { + field: bytes + type: quantitative + axis: { + title: Average bytes + } + } + color: { + datum: Average bytes + type: nominal + } + } } ``` {% include copy-curl.html %} + +## Additional resources + +The following resources provide additional information about Vega visualizations in OpenSearch Dashboards: + +- [Improving ease of use in OpenSearch Dashboards with Vega visualizations](https://opensearch.org/blog/Improving-Dashboards-usability-with-Vega/) diff --git a/_dashboards/visualize/visbuilder.md b/_dashboards/visualize/visbuilder.md index de4dfb1666..51ce5b1e46 100644 --- a/_dashboards/visualize/visbuilder.md +++ b/_dashboards/visualize/visbuilder.md @@ -1,13 +1,13 @@ --- layout: default -title: Using VisBuilder +title: VisBuilder parent: Building data visualizations nav_order: 100 redirect_from: - /dashboards/drag-drop-wizard/ --- -# Using VisBuilder +# VisBuilder You can use the VisBuilder visualization type in OpenSearch Dashboards to create data visualizations by using a drag-and-drop gesture. With VisBuilder you have: @@ -19,7 +19,7 @@ You can use the VisBuilder visualization type in OpenSearch Dashboards to create ## Try VisBuilder in the OpenSearch Dashboards playground -If you'd like to try out VisBuilder without installing OpenSearch locally, you can do so in the [Dashboards playground](https://playground.opensearch.org/app/vis-builder#/). +You can try VisBuilder without installing OpenSearch locally by using [OpenSearch Dashboards Playground](https://playground.opensearch.org/app/vis-builder#/). VisBuilder is enabled by default. ## Try VisBuilder locally @@ -37,4 +37,4 @@ Follow these steps to create a new visualization using VisBuilder in your enviro Here’s an example visualization. Your visualization will look different depending on your data and the fields you select. -Visualization generated using sample data \ No newline at end of file +Visualization generated using sample data diff --git a/_data-prepper/common-use-cases/s3-logs.md b/_data-prepper/common-use-cases/s3-logs.md index 7986a7eef8..8d5a9ce967 100644 --- a/_data-prepper/common-use-cases/s3-logs.md +++ b/_data-prepper/common-use-cases/s3-logs.md @@ -9,7 +9,6 @@ nav_order: 40 Data Prepper allows you to load logs from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3), including traditional logs, JSON documents, and CSV logs. - ## Architecture Data Prepper can read objects from S3 buckets using an [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) queue and [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html). @@ -20,7 +19,7 @@ The following diagram shows the overall architecture of the components involved. S3 source architecture{: .img-fluid} -The flow of data is as follows. +The component data flow is as follows: 1. A system produces logs into the S3 bucket. 2. S3 creates an S3 event notification in the SQS queue. @@ -28,7 +27,6 @@ The flow of data is as follows. 4. Data Prepper downloads the content from the S3 object. 5. Data Prepper sends a document to OpenSearch for the content in the S3 object. - ## Pipeline overview Data Prepper supports reading data from S3 using the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/). @@ -44,7 +42,6 @@ Before Data Prepper can read log data from S3, you need the following prerequisi - An S3 bucket. - A log producer that writes logs to S3. The exact log producer will vary depending on your specific use case, but could include writing logs to S3 or a service such as Amazon CloudWatch. - ## Getting started Use the following steps to begin loading logs from S3 with Data Prepper. @@ -57,8 +54,7 @@ Use the following steps to begin loading logs from S3 with Data Prepper. ### Setting permissions for Data Prepper -To view S3 logs, Data Prepper needs access to Amazon SQS and S3. -Use the following example to set up permissions: +To view S3 logs, Data Prepper needs access to Amazon SQS and S3. Use the following example to set up permissions: ```json { @@ -88,12 +84,13 @@ Use the following example to set up permissions: ] } ``` +{% include copy-curl.html %} If your S3 objects or SQS queues do not use KMS, you can remove the `kms:Decrypt` permission. ### SQS dead-letter queue -The are two options for how to handle errors resulting from processing S3 objects. +The following two options can be used to handle S3 object processing errors: - Use an SQS dead-letter queue (DLQ) to track the failure. This is the recommended approach. - Delete the message from SQS. You must manually find the S3 object and correct the error. @@ -104,8 +101,8 @@ The following diagram shows the system architecture when using SQS with DLQ. To use an SQS dead-letter queue, perform the following steps: -1. Create a new SQS standard queue to act as your DLQ. -2. Configure your SQS's redrive policy [to use your DLQ](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-dead-letter-queue.html). Consider using a low value such as 2 or 3 for the "Maximum Receives" setting. +1. Create a new SQS standard queue to act as the DLQ. +2. Configure your SQS re-drive policy [to use DLQ](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-dead-letter-queue.html). Consider using a low value such as 2 or 3 for the **Maximum Receives** setting. 3. Configure the Data Prepper `s3` source to use `retain_messages` for `on_error`. This is the default behavior. ## Pipeline design @@ -125,6 +122,7 @@ s3-log-pipeline: queue_url: "arn:aws:sqs::<123456789012>:" visibility_timeout: "2m" ``` +{% include copy-curl.html %} Configure the following options according to your use case: @@ -164,10 +162,11 @@ s3-log-pipeline: password: "admin" index: s3_logs ``` +{% include copy-curl.html %} ## Multiple Data Prepper pipelines -We recommend that you have one SQS queue per Data Prepper pipeline. In addition, you can have multiple nodes in the same cluster reading from the same SQS queue, which doesn't require additional configuration with Data Prepper. +It is recommended that you have one SQS queue per Data Prepper pipeline. In addition, you can have multiple nodes in the same cluster reading from the same SQS queue, which doesn't require additional Data Prepper configuration. If you have multiple pipelines, you must create multiple SQS queues for each pipeline, even if both pipelines use the same S3 bucket. @@ -175,6 +174,55 @@ If you have multiple pipelines, you must create multiple SQS queues for each pip To meet the scale of logs produced by S3, some users require multiple SQS queues for their logs. You can use [Amazon Simple Notification Service](https://docs.aws.amazon.com/sns/latest/dg/welcome.html) (Amazon SNS) to route event notifications from S3 to an SQS [fanout pattern](https://docs.aws.amazon.com/sns/latest/dg/sns-common-scenarios.html). Using SNS, all S3 event notifications are sent directly to a single SNS topic, where you can subscribe to multiple SQS queues. -To make sure that Data Prepper can directly parse the event from the SNS topic, configure [raw message delivery](https://docs.aws.amazon.com/sns/latest/dg/sns-large-payload-raw-message-delivery.html) on the SNS to SQS subscription. Setting this option will not affect other SQS queues that are subscribed to that SNS topic. +To make sure that Data Prepper can directly parse the event from the SNS topic, configure [raw message delivery](https://docs.aws.amazon.com/sns/latest/dg/sns-large-payload-raw-message-delivery.html) on the SNS-to-SQS subscription. Applying this option does not affect other SQS queues subscribed to the SNS topic. + +## Filtering and retrieving data using Amazon S3 Select + +If a pipeline uses an S3 source, you can use SQL expressions to perform filtering and computations on the contents of S3 objects before ingesting them into the pipeline. +The `s3_select` option supports objects in the [Parquet File Format](https://parquet.apache.org/docs/). It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only) and supports columnar compression for the Parquet File Format using GZIP and Snappy. +Refer to [Filtering and retrieving data using Amazon S3 Select](https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html) and [SQL reference for Amazon S3 Select](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-select-sql-reference.html) for comprehensive information about using Amazon S3 Select. +{: .note} + +The following example pipeline retrieves all data from S3 objects encoded in the Parquet File Format: + +```json +pipeline: + source: + s3: + s3_select: + expression: "select * from s3object s" + input_serialization: parquet + notification_type: "sqs" +... +``` +{% include copy-curl.html %} + +The following example pipeline retrieves only the first 10,000 records in the objects: + +```json +pipeline: + source: + s3: + s3_select: + expression: "select * from s3object s LIMIT 10000" + input_serialization: parquet + notification_type: "sqs" +... +``` +{% include copy-curl.html %} + +The following example pipeline retrieves records from S3 objects that have a `data_value` in the given range of 200--500: + +```json +pipeline: + source: + s3: + s3_select: + expression: "select s.* from s3object s where s.data_value > 200 and s.data_value < 500 " + input_serialization: parquet + notification_type: "sqs" +... +``` +{% include copy-curl.html %} diff --git a/_data-prepper/pipelines/configuration/processors/add-entries.md b/_data-prepper/pipelines/configuration/processors/add-entries.md index d28f2d8f6f..26b95c7b64 100644 --- a/_data-prepper/pipelines/configuration/processors/add-entries.md +++ b/_data-prepper/pipelines/configuration/processors/add-entries.md @@ -10,55 +10,215 @@ nav_order: 40 The `add_entries` processor adds entries to an event. -### Configuration +## Configuration You can configure the `add_entries` processor with the following options. | Option | Required | Description | | :--- | :--- | :--- | | `entries` | Yes | A list of entries to add to an event. | -| `key` | Yes | The key of the new entry to be added. Some examples of keys include `my_key`, `myKey`, and `object/sub_Key`. | -| `metadata_key` | Yes | The key for the new metadata attribute. The argument must be a literal string key and not a JSON Pointer. Either one string key or `metadata_key` is required. | +| `key` | No | The key of the new entry to be added. Some examples of keys include `my_key`, `myKey`, and `object/sub_Key`. The key can also be a format expression, for example, `${/key1}` to use the value of field `key1` as the key. | +| `metadata_key` | No | The key for the new metadata attribute. The argument must be a literal string key and not a JSON Pointer. Either one string key or `metadata_key` is required. | +| `value` | No | The value of the new entry to be added, which can be used with any of the following data types: strings, Booleans, numbers, null, nested objects, and arrays. | | `format` | No | A format string to use as the value of the new entry, for example, `${key1}-${key2}`, where `key1` and `key2` are existing keys in the event. Required if neither `value` nor `value_expression` is specified. | | `value_expression` | No | An expression string to use as the value of the new entry. For example, `/key` is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, `length(/key)` will return the length of the key in the event when the key is a string. For more information about keys, see [Expression syntax](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/). | | `add_when` | No | A [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"'`, that will be evaluated to determine whether the processor will be run on the event. | -| `value` | Yes | The value of the new entry to be added. You can use the following data types: strings, Booleans, numbers, null, nested objects, and arrays. | | `overwrite_if_key_exists` | No | When set to `true`, the existing value is overwritten if `key` already exists in the event. The default value is `false`. | +| `append_if_key_exists` | No | When set to `true`, the existing value will be appended if a `key` already exists in the event. An array will be created if the existing value is not an array. Default is `false`. | -### Usage -To get started, create the following `pipeline.yaml` file: +## Usage + +The following examples show how the `add_entries` processor can be used in different cases. + +### Example: Add entries with simple values + +The following example shows you how to configure the processor to add entries with simple values: ```yaml -pipeline: - source: - ... - .... +... processor: - add_entries: entries: - - key: "newMessage" - value: 3 - overwrite_if_key_exists: true - - metadata_key: myMetadataKey - value_expression: 'length("newMessage")' - add_when: '/some_key == "test"' - sink: + - key: "name" + value: "John" + - key: "age" + value: 20 +... ``` {% include copy.html %} +When the input event contains the following data: + +```json +{"message": "hello"} +``` -For example, when your source contains the following event record: +The processed event will contain the following data: + +```json +{"message": "hello", "name": "John", "age": 20} +``` + +### Example: Add entries using format strings + +The following example shows you how to configure the processor to add entries with values from other fields: + +```yaml +... + processor: + - add_entries: + entries: + - key: "date" + format: "${month}-${day}" +... +``` +{% include copy.html %} + +When the input event contains the following data: + +```json +{"month": "Dec", "day": 1} +``` + +The processed event will contain the following data: + +```json +{"month": "Dec", "day": 1, "date": "Dec-1"} +``` + +### Example: Add entries using value expressions + +The following example shows you how to configure the processor to use the `value_expression` option: + +```yaml +... + processor: + - add_entries: + entries: + - key: "length" + value_expression: "length(/message)" +... +``` +{% include copy.html %} + +When the input event contains the following data: ```json {"message": "hello"} ``` -And then you run the `add_entries` processor using the example pipeline, it adds a new entry, `{"newMessage": 3}`, to the existing event, `{"message": "hello"}`, so that the new event contains two entries in the final output: +The processed event will contain the following data: + +```json +{"message": "hello", "length": 5} +``` + +### Example: Add metadata + +The following example shows you how to configure the processor to add metadata to events: + +```yaml +... + processor: + - add_entries: + entries: + - metadata_key: "length" + value_expression: "length(/message)" +... +``` +{% include copy.html %} + +When the input event contains the following data: ```json -{"message": "hello", "newMessage": 3} +{"message": "hello"} ``` -If `newMessage` already exists, its existing value is overwritten with a value of `3`. +The processed event will have the same data, with the metadata, `{"length": 5}`, attached. You can subsequently use expressions like `getMetadata("length")` in the pipeline. For more information, see the [`getMetadata` function](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/#getmetadata) documentation. + + +### Example: Add a dynamic key +The following example shows you how to configure the processor to add metadata to events using a dynamic key: + +```yaml +... + processor: + - add_entries: + entries: + - key: "${/param_name}" + value_expression: "/param_value" +... +``` +{% include copy.html %} + +When the input event contains the following data: + +```json +{"param_name": "cpu", "param_value": 50} +``` + +The processed event will contain the following data: + +```json +{"param_name": "cpu", "param_value": 50, "cpu": 50} +``` + +### Example: Overwrite existing entries + +The following example shows you how to configure the processor to overwrite existing entries: + +```yaml +... + processor: + - add_entries: + entries: + - key: "message" + value: "bye" + overwrite_if_key_exists: true +... +``` +{% include copy.html %} + +When the input event contains the following data: + +```json +{"message": "hello"} +``` + +The processed event will contain the following data: + +```json +{"message": "bye"} +``` + +If `overwrite_if_key_exists` is not set to `true`, then the input event will not be changed after processing. + +### Example: Append values to existing entries + +The following example shows you how to configure the processor to append values to existing entries: + +```yaml +... + processor: + - add_entries: + entries: + - key: "message" + value: "world" + append_if_key_exists: true +... +``` +{% include copy.html %} + +When the input event contains the following data: + +```json +{"message": "hello"} +``` + +The processed event will contain the following data: + +```json +{"message": ["hello", "world"]} +``` diff --git a/_data-prepper/pipelines/configuration/processors/date.md b/_data-prepper/pipelines/configuration/processors/date.md index 7ac1040c26..c44a10ba16 100644 --- a/_data-prepper/pipelines/configuration/processors/date.md +++ b/_data-prepper/pipelines/configuration/processors/date.md @@ -15,7 +15,7 @@ The `date` processor adds a default timestamp to an event, parses timestamp fiel The following table describes the options you can use to configure the `date` processor. - + Option | Required | Type | Description :--- | :--- | :--- | :--- `match` | Conditionally | [Match](#Match) | The date match configuration. This option cannot be defined at the same time as `from_time_received`. There is no default value. @@ -27,7 +27,7 @@ Option | Required | Type | Description `source_timezone` | No | String | The time zone used to parse dates, including when the zone or offset cannot be extracted from the value. If the zone or offset are part of the value, then the time zone is ignored. A list of all the available time zones is contained in the **TZ database name** column of [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List). `destination_timezone` | No | String | The time zone used for storing the timestamp in the `destination` field. A list of all the available time zones is contained in the **TZ database name** column of [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List). `locale` | No | String | The location used for parsing dates. Commonly used for parsing month names (`MMM`). The value can contain language, country, or variant fields in IETF BCP 47, such as `en-US`, or a string representation of the [locale](https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html) object, such as `en_US`. A full list of locale fields, including language, country, and variant, can be found in [the language subtag registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). Default is `Locale.ROOT`. - + ### Match diff --git a/_data-prepper/pipelines/configuration/processors/key-value.md b/_data-prepper/pipelines/configuration/processors/key-value.md index 58b19d95a5..aedc1f8822 100644 --- a/_data-prepper/pipelines/configuration/processors/key-value.md +++ b/_data-prepper/pipelines/configuration/processors/key-value.md @@ -35,6 +35,9 @@ You can use the `key_value` processor to parse the specified field into key-valu | tags_on_failure | When a `kv` operation causes a runtime exception within the processor, the operation is safely stopped without crashing the processor, and the event is tagged with the provided tags. | If `tags_on_failure` is set to `["keyvalueprocessor_failure"]`, `{"tags": ["keyvalueprocessor_failure"]}` will be added to the event's metadata in the event of a runtime exception. | | value_grouping | Specifies whether to group values using predefined value grouping delimiters: `{...}`, `[...]', `<...>`, `(...)`, `"..."`, `'...'`, `http://... (space)`, and `https:// (space)`. If this flag is enabled, then the content between the delimiters is considered to be one entity and is not parsed for key-value pairs. Default is `false`. If `value_grouping` is `true`, then `{"key1=[a=b,c=d]&key2=value2"}` parses to `{"key1": "[a=b,c=d]", "key2": "value2"}`. | | drop_keys_with_no_value | Specifies whether keys should be dropped if they have a null value. Default is `false`. If `drop_keys_with_no_value` is set to `true`, then `{"key1=value1&key2"}` parses to `{"key1": "value1"}`. | +| strict_grouping | Specifies whether strict grouping should be enabled when the `value_grouping` or `string_literal_character` options are used. Default is `false`. | When enabled, groups with unmatched end characters yield errors. The event is ignored after the errors are logged. | +| string_literal_character | Can be set to either a single quotation mark (`'`) or a double quotation mark (`"`). Default is `null`. | When this option is used, any text contained within the specified quotation mark character will be ignored and excluded from key-value parsing. For example, `text1 "key1=value1" text2 key2=value2` would parse to `{"key2": "value2"}`. | +| key_value_when | Allows you to specify a [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"`, that will be evaluated to determine whether the processor should be applied to the event. | diff --git a/_data-prepper/pipelines/configuration/sinks/s3.md b/_data-prepper/pipelines/configuration/sinks/s3.md index 71cb7b1f70..d1413f6ffc 100644 --- a/_data-prepper/pipelines/configuration/sinks/s3.md +++ b/_data-prepper/pipelines/configuration/sinks/s3.md @@ -15,19 +15,20 @@ The `s3` sink uses the following format when batching events: ``` ${pathPrefix}events-%{yyyy-MM-dd'T'HH-mm-ss'Z'}-${currentTimeInNanos}-${uniquenessId}.${codecSuppliedExtension} ``` +{% include copy-curl.html %} -When a batch of objects is written to S3, the objects are formatted similarly to the following: +When a batch of objects is written to Amazon S3, the objects are formatted similarly to the following: ``` my-logs/2023/06/09/06/events-2023-06-09T06-00-01-1686290401871214927-ae15b8fa-512a-59c2-b917-295a0eff97c8.json ``` +{% include copy-curl.html %} - -For more information about how to configure an object, see the [Object key](#object-key-configuration) section. +For more information about how to configure an object, refer to [Object key](#object-key-configuration). ## Usage -The following example creates a pipeline configured with an s3 sink. It contains additional options for customizing the event and size thresholds for which the pipeline sends record events and sets the codec type `ndjson`: +The following example creates a pipeline configured with an `s3` sink. It contains additional options for customizing the event and size thresholds for the pipeline and sets the codec type as `ndjson`: ``` pipeline: @@ -49,10 +50,11 @@ pipeline: ndjson: buffer_type: in_memory ``` +{% include copy-curl.html %} ## IAM permissions -In order to use the `s3` sink, configure AWS Identity and Access Management (IAM) to grant Data Prepper permissions to write to Amazon S3. You can use a configuration similar to the following JSON configuration: +To use the `s3` sink, configure AWS Identity and Access Management (IAM) to grant Data Prepper permissions to write to Amazon S3. You can use a configuration similar to the following JSON configuration: ```json { @@ -69,36 +71,62 @@ In order to use the `s3` sink, configure AWS Identity and Access Management (IAM ] } ``` +{% include copy-curl.html %} + +## Cross-account S3 access + +When Data Prepper fetches data from an S3 bucket, it verifies bucket ownership using a [bucket owner condition](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-owner-condition.html). + +By default, the S3 sink does not require `bucket_owners`. If `bucket_owners` is configured and a bucket is not included in one of the mapped configurations, `default_bucket_owner` defaults to the account ID in `aws.sts_role_arn`. You can configure both `bucket_owners` and `default_bucket_owner` and apply the settings together. + +When ingesting data from multiple S3 buckets with different account associations, configure Data Prepper for cross-account S3 access based on the following conditions: + +- For S3 buckets belonging to the same account, set `default_bucket_owner` to that account's ID. +- For S3 buckets belonging to multiple accounts, use a `bucket_owners` map. + +A `bucket_owners` map specifies account IDs for buckets belonging to multiple accounts. For example, in the following configuration, `my-bucket-01` is owned by `123456789012` and `my-bucket-02` is owned by `999999999999`: + +``` +sink: + - s3: + default_bucket_owner: 111111111111 + bucket_owners: + my-bucket-01: 123456789012 + my-bucket-02: 999999999999 +``` +{% include copy-curl.html %} ## Configuration Use the following options when customizing the `s3` sink. -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`bucket` | Yes | String | The name of the S3 bucket to which objects are stored. The `name` must match the name of your object store. -`codec` | Yes | [Codec](#codec) | The codec determining the format of output data. -`aws` | Yes | AWS | The AWS configuration. See [aws](#aws) for more information. -`threshold` | Yes | [Threshold](#threshold-configuration) | Configures when to write an object to S3. -`object_key` | No | Sets the `path_prefix` and the `file_pattern` of the object store. The file pattern is always `events-%{yyyy-MM-dd'T'hh-mm-ss}`. By default, those objects are found inside the root directory of the bucket. The `path_prefix` is configurable. -`compression` | No | String | The compression algorithm to apply: `none`, `gzip`, or `snappy`. Default is `none`. -`buffer_type` | No | [Buffer type](#buffer-type) | Determines the buffer type. -`max_retries` | No | Integer | The maximum number of times a single request should retry when ingesting data to S3. Defaults to `5`. - -## aws +Option | Required | Type | Description +:--- |:---------|:------------------------------------------------| :--- +`bucket` | Yes | String | Specifies the sink's S3 bucket name. Supports dynamic bucket naming using [Data Prepper expressions]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/), for example, `test-${/bucket_id}`. If a dynamic bucket is inaccessible and no `default_bucket` is configured, then the object data is dropped. +`default_bucket` | No | String | A static bucket for inaccessible dynamic buckets in `bucket`. +`bucket_owners` | No | Map | A map of bucket names and their account owner IDs for cross-account access. Refer to [Cross-account S3 access](#s3_bucket_ownership). +`default_bucket_owner` | No | String | The AWS account ID for an S3 bucket owner. Refer to [Cross-account S3 access](#s3_bucket_ownership). +`codec` | Yes | [Codec](#codec) | Serializes data in S3 objects. +`aws` | Yes | AWS | The AWS configuration. Refer to [aws](#aws). +`threshold` | Yes | [Threshold](#threshold-configuration) | Condition for writing objects to S3. +`aggregate_threshold` | No | [Aggregate threshold](#threshold-configuration) | A condition for flushing objects with a dynamic `path_prefix`. +`object_key` | No | [Object key](#object-key-configuration) | Sets `path_prefix` and `file_pattern` for object storage. The file pattern is `events-%{yyyy-MM-dd'T'hh-mm-ss}`. By default, these objects are found in the bucket's root directory. `path_prefix` is configurable. +`compression` | No | String | The compression algorithm: Either `none`, `gzip`, or `snappy`. Default is `none`. +`buffer_type` | No | [Buffer type](#buffer-type) | The buffer type configuration. +`max_retries` | No | Integer | The maximum number of retries for S3 ingestion requests. Default is `5`. + +## `aws` Option | Required | Type | Description :--- | :--- | :--- | :--- `region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). -`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). +`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon Simple Queue Service (Amazon SQS) and Amazon S3. Defaults to `null`, which uses the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). `sts_header_overrides` | No | Map | A map of header overrides that the IAM role assumes for the sink plugin. -`sts_external_id` | No | String | An STS external ID used when Data Prepper assumes the role. For more information, see the `ExternalId` documentation in the [STS AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) API reference. - - +`sts_external_id` | No | String | An AWS STS external ID used when Data Prepper assumes the role. For more information, refer to the `ExternalId` section under [AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) in the AWS STS API reference. ## Threshold configuration -Use the following options to set ingestion thresholds for the `s3` sink. When any of these conditions are met, Data Prepper will write events to an S3 object. +Use the following options to set ingestion thresholds for the `s3` sink. Data Prepper writes events to an S3 object when any of these conditions occur. Option | Required | Type | Description :--- | :--- | :--- | :--- @@ -106,84 +134,77 @@ Option | Required | Type | Description `maximum_size` | No | String | The maximum number of bytes to accumulate before writing an object to S3. Default is `50mb`. `event_collect_timeout` | Yes | String | The maximum amount of time before Data Prepper writes an event to S3. The value should be either an ISO-8601 duration, such as `PT2M30S`, or a simple notation, such as `60s` or `1500ms`. +## Aggregate threshold configuration + +Use the following options to set rules or limits that trigger certain actions or behavior when an aggregated value crosses a defined threshold. + +Option | Required | Type | Description +:--- |:-----------------------------------|:-------| :--- +`flush_capacity_ratio` | No | Float | The percentage of groups to be force-flushed when `aggregate_threshold maximum_size` is reached. The percentage is expressed as a number between `0.0` and `1.0`. Default is `0.5`. +`maximum_size` | Yes | String | The maximum number of bytes to accumulate before force-flushing objects. For example, `128mb`. ## Buffer type -`buffer_type` is an optional configuration that determines how Data Prepper temporarily stores data before writing an object to S3. The default value is `in_memory`. Use one of the following options: +`buffer_type` is an optional configuration that determines how Data Prepper temporarily stores data before writing an object to S3. The default value is `in_memory`. + +Use one of the following options: - `in_memory`: Stores the record in memory. -- `local_file`: Flushes the record into a file on your local machine. This uses your machine's temporary directory. +- `local_file`: Flushes the record into a file on your local machine. This option uses your machine's temporary directory. - `multipart`: Writes using the [S3 multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html). Every 10 MB is written as a part. ## Object key configuration +Use the following options to define how object keys are constructed for objects stored in S3. + Option | Required | Type | Description :--- | :--- | :--- | :--- -`path_prefix` | No | String | The S3 key prefix path to use for objects written to S3. Accepts date-time formatting. For example, you can use `%{yyyy}/%{MM}/%{dd}/%{HH}/` to create hourly folders in S3. The prefix path should end with `/`. By default, Data Prepper writes objects to the root of the S3 bucket. - +`path_prefix` | No | String | The S3 key prefix path to use for objects written to S3. Accepts date-time formatting and dynamic injection of values using [Data Prepper expressions](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/). For example, you can use `/${/my_partition_key}/%{yyyy}/%{MM}/%{dd}/%{HH}/` to create hourly folders in S3 based on the `my_partition_key` value. The prefix path should end with `/`. By default, Data Prepper writes objects to the S3 bucket root. -## codec +## `codec` The `codec` determines how the `s3` source formats data written to each S3 object. -### avro codec +### `avro` codec -The `avro` codec writes an event as an [Apache Avro](https://avro.apache.org/) document. +The `avro` codec writes an event as an [Apache Avro](https://avro.apache.org/) document. Because Avro requires a schema, you may either define the schema or have Data Prepper automatically generate it. Defining your own schema is recommended because this will allow it to be tailored to your particular use case. -Because Avro requires a schema, you may either define the schema yourself, or Data Prepper will automatically generate a schema. -In general, you should define your own schema because it will most accurately reflect your needs. +When you provide your own Avro schema, that schema defines the final structure of your data. Any extra values in any incoming events that are not mapped in the Avro schema will not be included in the final destination. Data Prepper does not allow the use of `include_keys` or `exclude_keys` with a custom schema so as to avoid confusion between a custom Avro schema and the `include_keys` or `exclude_keys` sink configurations. -We recommend that you make your Avro fields use a null [union](https://avro.apache.org/docs/current/specification/#unions). -Without the null union, each field must be present or the data will fail to write to the sink. -If you can be certain that each each event has a given field, you can make it non-nullable. +In cases where your data is uniform, you may be able to automatically generate a schema. Automatically generated schemas are based on the first event that the codec receives. The schema will only contain keys from this event, and all keys must be present in all events in order to automatically generate a working schema. Automatically generated schemas make all fields nullable. Use the `include_keys` and `exclude_keys` sink configurations to control which data is included in the automatically generated schema. -When you provide your own Avro schema, that schema defines the final structure of your data. -Therefore, any extra values inside any incoming events that are not mapped in the Arvo schema will not be included in the final destination. -To avoid confusion between a custom Arvo schema and the `include_keys` or `exclude_keys` sink configurations, Data Prepper does not allow the use of the `include_keys` or `exclude_keys` with a custom schema. - -In cases where your data is uniform, you may be able to automatically generate a schema. -Automatically generated schemas are based on the first event received by the codec. -The schema will only contain keys from this event. -Therefore, you must have all keys present in all events in order for the automatically generated schema to produce a working schema. -Automatically generated schemas make all fields nullable. -Use the sink's `include_keys` and `exclude_keys` configurations to control what data is included in the auto-generated schema. +Avro fields should use a null [union](https://avro.apache.org/docs/current/specification/#unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. +Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- `schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. `auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. - -### ndjson codec - -The `ndjson` codec writes each line as a JSON object. +### `ndjson` codec -The `ndjson` codec does not take any configurations. +The `ndjson` codec writes each line as a JSON object. The `ndjson` codec does not take any configurations. +### `json` codec -### json codec - -The `json` codec writes events in a single large JSON file. -Each event is written into an object within a JSON array. +The `json` codec writes events in a single large JSON file. Each event is written into an object within a JSON array. +Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- `key_name` | No | String | The name of the key for the JSON array. By default this is `events`. +### `parquet` codec -### parquet codec - -The `parquet` codec writes events into a Parquet file. -When using the Parquet codec, set the `buffer_type` to `in_memory`. +The `parquet` codec writes events into a Parquet file. When using the codec, set `buffer_type` to `in_memory`. -The Parquet codec writes data using the Avro schema. -Because Parquet requires an Avro schema, you may either define the schema yourself, or Data Prepper will automatically generate a schema. -However, we generally recommend that you define your own schema so that it can best meet your needs. +The `parquet` codec writes data using the schema. Because Parquet requires an Avro schema, you may either define the schema yourself or have Data Prepper automatically generate it. Defining your own schema is recommended because this will allow it to be tailored to your particular use case. -For details on the Avro schema and recommendations, see the [Avro codec](#avro-codec) documentation. +For more information about the Avro schema, refer to [Avro codec](#avro-codec). +Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- @@ -192,7 +213,7 @@ Option | Required | Type | Description ### Setting a schema with Parquet -The following example shows you how to configure the `s3` sink to write Parquet data into a Parquet file using a schema for [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-log-records): +The following example pipeline shows how to configure the `s3` sink to write Parquet data into a Parquet file using a schema for [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-log-records): ``` pipeline: @@ -235,4 +256,4 @@ pipeline: event_collect_timeout: PT15M buffer_type: in_memory ``` - +{% include copy-curl.html %} diff --git a/_data-prepper/pipelines/configuration/sources/http-source.md b/_data-prepper/pipelines/configuration/sources/http.md similarity index 88% rename from _data-prepper/pipelines/configuration/sources/http-source.md rename to _data-prepper/pipelines/configuration/sources/http.md index b41855cdc2..06933edc1c 100644 --- a/_data-prepper/pipelines/configuration/sources/http-source.md +++ b/_data-prepper/pipelines/configuration/sources/http.md @@ -1,14 +1,16 @@ --- layout: default -title: http_source +title: http parent: Sources grand_parent: Pipelines nav_order: 5 +redirect_from: + - /data-prepper/pipelines/configuration/sources/http-source/ --- -# http_source +# http -`http_source` is a source plugin that supports HTTP. Currently, `http_source` only supports the JSON UTF-8 codec for incoming requests, such as `[{"key1": "value1"}, {"key2": "value2"}]`. The following table describes options you can use to configure the `http_source` source. +The `http` plugin accepts HTTP requests from clients. Currently, `http` only supports the JSON UTF-8 codec for incoming requests, such as `[{"key1": "value1"}, {"key2": "value2"}]`. The following table describes options you can use to configure the `http` source. Option | Required | Type | Description :--- | :--- | :--- | :--- @@ -19,6 +21,7 @@ request_timeout | No | Integer | The request timeout, in milliseconds. Default v thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default value is `200`. max_connection_count | No | Integer | The maximum allowed number of open connections. Default value is `500`. max_pending_requests | No | Integer | The maximum allowed number of tasks in the `ScheduledThreadPool` work queue. Default value is `1024`. +max_request_length | No | ByteCount | The maximum number of bytes allowed in the payload of a single HTTP request. Default value is `10mb`. authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/1.2.0/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java). ssl | No | Boolean | Enables TLS/SSL. Default value is false. ssl_certificate_file | Conditionally | String | SSL certificate chain file path or Amazon Simple Storage Service (Amazon S3) path. Amazon S3 path example `s3:///`. Required if `ssl` is set to true and `use_acm_certificate_for_ssl` is set to false. @@ -35,7 +38,7 @@ Content will be added to this section.---> ## Metrics -The `http_source` source includes the following metrics. +The `http` source includes the following metrics. ### Counters diff --git a/_data-prepper/pipelines/configuration/sources/otel-logs-source.md b/_data-prepper/pipelines/configuration/sources/otel-logs-source.md index 58d8a2b059..068369efaf 100644 --- a/_data-prepper/pipelines/configuration/sources/otel-logs-source.md +++ b/_data-prepper/pipelines/configuration/sources/otel-logs-source.md @@ -21,7 +21,8 @@ You can configure the `otel_logs_source` source with the following options. | Option | Type | Description | | :--- | :--- | :--- | | port | int | Represents the port that the `otel_logs_source` source is running on. Default value is `21892`. | -| path | string | Represents the path for sending unframed HTTP requests. You can use this option to support an unframed gRPC request with an HTTP idiomatic path to a configurable path. The path should start with `/`, and its length should be at least 1. The `/opentelemetry.proto.collector.logs.v1.LogsService/Export` endpoint is disabled for both gRPC and HTTP requests if the path is configured. The path can contain a `${pipelineName}` placeholder, which is replaced with the pipeline name. If the value is empty and `unframed_requests` is `true`, then the path that the source provides is `/opentelemetry.proto.collector.logs.v1.LogsService/Export`. | +| path | string | Represents the path for sending unframed HTTP requests. You can use this option to support an unframed gRPC request with an HTTP idiomatic path to a configurable path. The path should start with `/`, and its length should be at least 1. The `/opentelemetry.proto.collector.logs.v1.LogsService/Export` endpoint is disabled for both gRPC and HTTP requests if the path is configured. The path can contain a `${pipelineName}` placeholder, which is replaced with the pipeline name. If the value is empty and `unframed_requests` is `true`, then the source provides the path `/opentelemetry.proto.collector.logs.v1.LogsService/Export`. | +| max_request_length | No | ByteCount | The maximum number of bytes allowed in the payload of a single gRPC or HTTP request. Default value is `10mb`. | request_timeout | int | Represents the request timeout duration in milliseconds. Default value is `10000`. | | health_check_service | Boolean | Enables the gRPC health check service under `grpc.health.v1/Health/Check`. Default value is `false`. | | proto_reflection_service | Boolean | Enables a reflection service for Protobuf services (see [ProtoReflectionService](https://grpc.github.io/grpc-java/javadoc/io/grpc/protobuf/services/ProtoReflectionService.html) and [gRPC reflection](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md)). Default value is `false`. | diff --git a/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md b/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md index 0301963538..bea74a96d3 100644 --- a/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md +++ b/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md @@ -19,6 +19,7 @@ proto_reflection_service | No | Boolean | Enables a reflection service for Proto unframed_requests | No | Boolean | Enables requests not framed using the gRPC wire protocol. thread_count | No | Integer | The number of threads to keep in the `ScheduledThreadPool`. Default value is `200`. max_connection_count | No | Integer | The maximum allowed number of open connections. Default value is `500`. +max_request_length | No | ByteCount | The maximum number of bytes allowed in the payload of a single gRPC or HTTP request. Default value is `10mb`. ssl | No | Boolean | Enables connections to the OpenTelemetry source port over TLS/SSL. Default value is `true`. sslKeyCertChainFile | Conditionally | String | File-system path or Amazon Simple Storage Service (Amazon S3) path to the security certificate (for example, `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if `ssl` is set to `true`. sslKeyFile | Conditionally | String | File-system path or Amazon S3 path to the security key (for example, `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if `ssl` is set to `true`. diff --git a/_data-prepper/pipelines/configuration/sources/otel-trace-source.md b/_data-prepper/pipelines/configuration/sources/otel-trace-source.md index 137592bbe8..1be7864c33 100644 --- a/_data-prepper/pipelines/configuration/sources/otel-trace-source.md +++ b/_data-prepper/pipelines/configuration/sources/otel-trace-source.md @@ -24,6 +24,7 @@ proto_reflection_service | No | Boolean | Enables a reflection service for Proto unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol. thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default value is `200`. max_connection_count | No | Integer | The maximum allowed number of open connections. Default value is `500`. +max_request_length | No | ByteCount | The maximum number of bytes allowed in the payload of a single gRPC or HTTP request. Default value is `10mb`. ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`. sslKeyCertChainFile | Conditionally | String | File system path or Amazon Simple Storage Service (Amazon S3) path to the security certificate (for example, `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if `ssl` is set to `true`. sslKeyFile | Conditionally | String | File system path or Amazon S3 path to the security key (for example, `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if `ssl` is set to `true`. diff --git a/_data/versions.json b/_data/versions.json index 969f93681b..0c99ed871e 100644 --- a/_data/versions.json +++ b/_data/versions.json @@ -1,10 +1,11 @@ { - "current": "2.14", + "current": "2.15", "all": [ - "2.14", + "2.15", "1.3" ], "archived": [ + "2.14", "2.13", "2.12", "2.11", @@ -23,7 +24,7 @@ "1.1", "1.0" ], - "latest": "2.14" + "latest": "2.15" } diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index dbff7c30f2..7c7b7375f9 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -23,7 +23,7 @@ Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-t IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. [Range]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). [Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object-fields/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. -[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/): A space-optimized version of a `text` field.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string.
[`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): A variation of `keyword` with efficient substring and regular expression matching. +[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/): A space-optimized version of a `text` field.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string.
[`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/): A variation of `keyword` with efficient substring and regular expression matching. [Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. [Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape. [Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). diff --git a/_getting-started/intro.md b/_getting-started/intro.md index 272d8d6981..edd178a23f 100644 --- a/_getting-started/intro.md +++ b/_getting-started/intro.md @@ -106,7 +106,6 @@ in | 1 the | 1, 2 eye | 1 of | 1 -the | 1 beholder | 1 and | 2 beast | 2 @@ -158,4 +157,4 @@ In OpenSearch, a shard is a Lucene index, which consists of _segments_ (or segme ## Next steps -- Learn how to install OpenSearch within minutes in [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/). \ No newline at end of file +- Learn how to install OpenSearch within minutes in [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/). diff --git a/_ingest-pipelines/processors/index-processors.md b/_ingest-pipelines/processors/index-processors.md index 79f30524d6..4b229f0a61 100644 --- a/_ingest-pipelines/processors/index-processors.md +++ b/_ingest-pipelines/processors/index-processors.md @@ -71,3 +71,7 @@ Processor type | Description ## Batch-enabled processors Some processors support batch ingestion---they can process multiple documents at the same time as a batch. These batch-enabled processors usually provide better performance when using batch processing. For batch processing, use the [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) and provide a `batch_size` parameter. All batch-enabled processors have a batch mode and a single-document mode. When you ingest documents using the `PUT` method, the processor functions in single-document mode and processes documents in series. Currently, only the `text_embedding` and `sparse_encoding` processors are batch enabled. All other processors process documents one at a time. + +## Selectively enabling processors + +Processors defined by the [ingest-common module](https://github.com/opensearch-project/OpenSearch/blob/2.x/modules/ingest-common/src/main/java/org/opensearch/ingest/common/IngestCommonPlugin.java) can be selectively enabled by providing the `ingest-common.processors.allowed` cluster setting. If not provided, then all processors are enabled by default. Specifying an empty list disables all processors. If the setting is changed to remove previously enabled processors, then any pipeline using a disabled processor will fail after node restart when the new setting takes effect. diff --git a/_install-and-configure/configuring-opensearch/index-settings.md b/_install-and-configure/configuring-opensearch/index-settings.md index 34b1829b78..a1894a0d2c 100644 --- a/_install-and-configure/configuring-opensearch/index-settings.md +++ b/_install-and-configure/configuring-opensearch/index-settings.md @@ -54,6 +54,8 @@ OpenSearch supports the following dynamic cluster-level index settings: - `indices.fielddata.cache.size` (String): The maximum size of the field data cache. May be specified as an absolute value (for example, `8GB`) or a percentage of the node heap (for example, `50%`). This value is static so you must specify it in the `opensearch.yml` file. If you don't specify this setting, the maximum size is unlimited. This value should be smaller than the `indices.breaker.fielddata.limit`. For more information, see [Field data circuit breaker]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/circuit-breaker/#field-data-circuit-breaker-settings). +- `indices.query.bool.max_clause_count` (Integer): Defines the maximum product of fields and terms that are queryable simultaneously. Before OpenSearch 2.16, a cluster restart was required in order to apply this static setting. Now dynamic, existing search thread pools may use the old static value initially, causing `TooManyClauses` exceptions. New thread pools use the updated value. Default is `1024`. + - `cluster.remote_store.index.path.type` (String): The path strategy for the data stored in the remote store. This setting is effective only for remote-store-enabled clusters. This setting supports the following values: - `fixed`: Stores the data in path structure `///`. - `hashed_prefix`: Stores the data in path structure `hash()////`. diff --git a/_layouts/default.html b/_layouts/default.html index 8ba6bd4703..d4d40d8cc4 100755 --- a/_layouts/default.html +++ b/_layouts/default.html @@ -165,9 +165,9 @@
{% if page.section == "opensearch" %} {% if site.doc_version == "supported" %} -

This is an earlier version of the OpenSearch documentation. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

+

You're viewing version {{site.opensearch_major_minor_version}} of the OpenSearch documentation. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

{% elsif site.doc_version == "unsupported" %} -

This version of the OpenSearch documentation is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

+

You're viewing version {{site.opensearch_major_minor_version}} of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

{% endif %} {% endif %} {% if site.heading_anchors != false %} diff --git a/_ml-commons-plugin/api/model-apis/register-model.md b/_ml-commons-plugin/api/model-apis/register-model.md index 61d821419e..ec830a7821 100644 --- a/_ml-commons-plugin/api/model-apis/register-model.md +++ b/_ml-commons-plugin/api/model-apis/register-model.md @@ -84,7 +84,7 @@ Field | Data type | Required/Optional | Description `name`| String | Required | The model name. | `version` | String | Required | The model version. | `model_format` | String | Required | The portable format of the model file. Valid values are `TORCH_SCRIPT` and `ONNX`. | -`function_name` | String | Required | For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`. +`function_name` | String | Required | For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`. For question answering models, set this parameter to `QUESTION_ANSWERING`. `model_content_hash_value` | String | Required | The model content hash generated using the SHA-256 hashing algorithm. `url` | String | Required | The URL that contains the model. | `description` | String | Optional| The model description. | diff --git a/_ml-commons-plugin/cluster-settings.md b/_ml-commons-plugin/cluster-settings.md index ebc9b92531..0c1f433bf2 100644 --- a/_ml-commons-plugin/cluster-settings.md +++ b/_ml-commons-plugin/cluster-settings.md @@ -108,7 +108,7 @@ plugins.ml_commons.sync_up_job_interval_in_seconds: 3 - Default value: `3` - Value range: [0, 86,400] -## Predict monitoring requests +## Monitoring predict requests Controls how many predict requests are monitored on one node. If set to `0`, OpenSearch clears all monitoring predict requests in cache and does not monitor for new predict requests. @@ -468,7 +468,7 @@ When set to `true`, this setting enables the search processors for retrieval-aug ### Setting ``` -plugins.ml_commons.agent_framework_enabled: true +plugins.ml_commons.rag_pipeline_feature_enabled: true ``` ### Values diff --git a/_ml-commons-plugin/custom-local-models.md b/_ml-commons-plugin/custom-local-models.md index a265d8804a..c2866938f6 100644 --- a/_ml-commons-plugin/custom-local-models.md +++ b/_ml-commons-plugin/custom-local-models.md @@ -109,7 +109,11 @@ To learn more about model groups, see [Model access control]({{site.url}}{{site. ## Step 2: Register a local model -To register a remote model to the model group created in step 1, provide the model group ID from step 1 in the following request: +To register a local model to the model group created in step 1, send a Register Model API request. For descriptions of Register Model API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). + +The `function_name` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`. For question answering models, set this parameter to `QUESTION_ANSWERING`. In this example, set `function_name` to `TEXT_EMBEDDING` because you're registering a text embedding model. + +Provide the model group ID from step 1 and send the following request: ```json POST /_plugins/_ml/models/_register @@ -118,7 +122,7 @@ POST /_plugins/_ml/models/_register "version": "1.0.1", "model_group_id": "wlcnb4kBJ1eYAeTMHlV6", "description": "This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.", - "model_task_type": "TEXT_EMBEDDING", + "function_name": "TEXT_EMBEDDING", "model_format": "TORCH_SCRIPT", "model_content_size_in_bytes": 266352827, "model_content_hash_value": "acdc81b652b83121f914c5912ae27c0fca8fabf270e6f191ace6979a19830413", @@ -143,7 +147,7 @@ POST /_plugins/_ml/models/_register "version": "1.0.1", "model_group_id": "wlcnb4kBJ1eYAeTMHlV6", "description": "This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.", - "model_task_type": "TEXT_EMBEDDING", + "function_name": "TEXT_EMBEDDING", "model_format": "TORCH_SCRIPT", "model_content_size_in_bytes": 266352827, "model_content_hash_value": "acdc81b652b83121f914c5912ae27c0fca8fabf270e6f191ace6979a19830413", @@ -159,8 +163,6 @@ POST /_plugins/_ml/models/_register ``` {% include copy.html %} -For descriptions of Register API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). The `model_task_type` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`. For question answering models, set this parameter to `QUESTION_ANSWERING`. - OpenSearch returns the task ID of the register operation: ```json @@ -183,7 +185,7 @@ When the operation is complete, the state changes to `COMPLETED`: { "model_id": "cleMb4kBJ1eYAeTMFFg4", "task_type": "REGISTER_MODEL", - "function_name": "REMOTE", + "function_name": "TEXT_EMBEDDING", "state": "COMPLETED", "worker_node": [ "XPcXLV7RQoi5m8NI_jEOVQ" @@ -229,7 +231,7 @@ When the operation is complete, the state changes to `COMPLETED`: { "model_id": "cleMb4kBJ1eYAeTMFFg4", "task_type": "DEPLOY_MODEL", - "function_name": "REMOTE", + "function_name": "TEXT_EMBEDDING", "state": "COMPLETED", "worker_node": [ "n-72khvBTBi3bnIIR8FTTw" @@ -379,4 +381,4 @@ The response provides the answer based on the context: } } } -``` \ No newline at end of file +``` diff --git a/_ml-commons-plugin/pretrained-models.md b/_ml-commons-plugin/pretrained-models.md index 8847d36291..30540cfe49 100644 --- a/_ml-commons-plugin/pretrained-models.md +++ b/_ml-commons-plugin/pretrained-models.md @@ -126,7 +126,7 @@ To learn more about model groups, see [Model access control]({{site.url}}{{site. ## Step 2: Register a local OpenSearch-provided model -To register a remote model to the model group created in step 1, provide the model group ID from step 1 in the following request. +To register an OpenSearch-provided model to the model group created in step 1, provide the model group ID from step 1 in the following request. Because pretrained models originate from the ML Commons model repository, you only need to provide the `name`, `version`, `model_group_id`, and `model_format` in the register API request: @@ -163,7 +163,7 @@ When the operation is complete, the state changes to `COMPLETED`: { "model_id": "cleMb4kBJ1eYAeTMFFg4", "task_type": "REGISTER_MODEL", - "function_name": "REMOTE", + "function_name": "TEXT_EMBEDDING", "state": "COMPLETED", "worker_node": [ "XPcXLV7RQoi5m8NI_jEOVQ" @@ -209,7 +209,7 @@ When the operation is complete, the state changes to `COMPLETED`: { "model_id": "cleMb4kBJ1eYAeTMFFg4", "task_type": "DEPLOY_MODEL", - "function_name": "REMOTE", + "function_name": "TEXT_EMBEDDING", "state": "COMPLETED", "worker_node": [ "n-72khvBTBi3bnIIR8FTTw" diff --git a/_observing-your-data/metricsanalytics.md b/_observing-your-data/prometheusmetrics.md similarity index 98% rename from _observing-your-data/metricsanalytics.md rename to _observing-your-data/prometheusmetrics.md index 7c31e1cc33..0f9043d815 100644 --- a/_observing-your-data/metricsanalytics.md +++ b/_observing-your-data/prometheusmetrics.md @@ -2,8 +2,6 @@ layout: default title: Metric analytics nav_order: 40 -redirect_from: - - /observing-your-data/metricsanalytics/ --- # Metric analytics @@ -165,7 +163,7 @@ You can view metrics from remote OpenSearch clusters by using the **Metrics** to You can also view metric visualizations from other sources alongside local metric visualizations. From the **DATA SOURCES** dropdown menu, choose the remote metric visualization to add it to the group of visualizations already shown on the dashboard. An example dashboard is shown in the following image. -Metrics dashboard +Metrics dashboard To learn about multi-cluster support for data sources, see [Enable OpenSearch Dashboards to support multiple OpenSearch clusters](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388). diff --git a/_query-dsl/geo-and-xy/geo-bounding-box.md b/_query-dsl/geo-and-xy/geo-bounding-box.md index a0ee85f093..df697e2ce5 100644 --- a/_query-dsl/geo-and-xy/geo-bounding-box.md +++ b/_query-dsl/geo-and-xy/geo-bounding-box.md @@ -31,6 +31,7 @@ PUT testindex1 } } ``` +{% include copy-curl.html %} Index three geopoints as objects with latitudes and longitudes: @@ -42,7 +43,10 @@ PUT testindex1/_doc/1 "lon": 40.71 } } +``` +{% include copy-curl.html %} +```json PUT testindex1/_doc/2 { "point": { @@ -50,7 +54,10 @@ PUT testindex1/_doc/2 "lon": 22.62 } } +``` +{% include copy-curl.html %} +```json PUT testindex1/_doc/3 { "point": { @@ -59,6 +66,7 @@ PUT testindex1/_doc/3 } } ``` +{% include copy-curl.html %} Search for all documents and filter the documents whose points lie within the rectangle defined in the query: @@ -88,6 +96,7 @@ GET testindex1/_search } } ``` +{% include copy-curl.html %} The response contains the matching document: @@ -163,6 +172,7 @@ GET testindex1/_search } } ``` +{% include copy-curl.html %} ## Request fields @@ -205,6 +215,7 @@ GET testindex1/_search } } ``` +{% include copy-curl.html %} To specify a bounding box that covers the whole area of a geohash, provide that geohash as both `top_left` and `bottom_right` parameters of the bounding box: @@ -227,4 +238,5 @@ GET testindex1/_search } } } -``` \ No newline at end of file +``` +{% include copy-curl.html %} \ No newline at end of file diff --git a/_query-dsl/geo-and-xy/geodistance.md b/_query-dsl/geo-and-xy/geodistance.md new file mode 100644 index 0000000000..7a36b0c933 --- /dev/null +++ b/_query-dsl/geo-and-xy/geodistance.md @@ -0,0 +1,121 @@ +--- +layout: default +title: Geodistance +parent: Geographic and xy queries +grand_parent: Query DSL +nav_order: 20 +--- + +# Geodistance query + +A geodistance query returns documents with geopoints that are within a specified distance from the provided geopoint. A document with multiple geopoints matches the query if at least one geopoint matches the query. + +The searched document field must be mapped as `geo_point`. +{: .note} + +## Example + +Create a mapping with the `point` field mapped as `geo_point`: + +```json +PUT testindex1 +{ + "mappings": { + "properties": { + "point": { + "type": "geo_point" + } + } + } +} +``` +{% include copy-curl.html %} + +Index a geopoint, specifying its latitude and longitude: + +```json +PUT testindex1/_doc/1 +{ + "point": { + "lat": 74.00, + "lon": 40.71 + } +} +``` +{% include copy-curl.html %} + +Search for documents whose `point` objects are within the specified `distance` from the specified `point`: + +```json +GET /testindex1/_search +{ + "query": { + "bool": { + "must": { + "match_all": {} + }, + "filter": { + "geo_distance": { + "distance": "50mi", + "point": { + "lat": 73.5, + "lon": 40.5 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the matching document: + +```json +{ + "took": 5, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "testindex1", + "_id": "1", + "_score": 1, + "_source": { + "point": { + "lat": 74, + "lon": 40.71 + } + } + } + ] + } +} +``` + +## Request fields + +Geodistance queries accept the following fields. + +Field | Data type | Description +:--- | :--- | :--- +`_name` | String | The name of the filter. Optional. +`distance` | String | The distance within which to match the points. This distance is the radius of a circle centered at the specified point. For supported distance units, see [Distance units]({{site.url}}{{site.baseurl}}/api-reference/common-parameters/#distance-units). Required. +`distance_type` | String | Specifies how to calculate the distance. Valid values are `arc` or `plane` (faster but inaccurate for long distances or points close to the poles). Optional. Default is `arc`. +`validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Optional. Default is `STRICT`. +`ignore_unmapped` | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, then the query does not return any documents that contain an unmapped field. If set to `false`, then an exception is thrown when the field is unmapped. Optional. Default is `false`. + +## Accepted formats + +You can specify the geopoint coordinates when indexing a document and searching for documents in any [format]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats) accepted by the geopoint field type. \ No newline at end of file diff --git a/_query-dsl/geo-and-xy/index.md b/_query-dsl/geo-and-xy/index.md index 44e2df9b49..cb0559927d 100644 --- a/_query-dsl/geo-and-xy/index.md +++ b/_query-dsl/geo-and-xy/index.md @@ -12,7 +12,7 @@ redirect_from: # Geographic and xy queries -Geographic and xy queries let you search fields that contain points and shapes on a map or coordinate plane. Geographic queries work on geospatial data, while xy queries work on two-dimensional coordinate data. Out of all geographic queries, the geoshape query is very similar to the xy query, but the former searches [geographic fields]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic), while the latter searches [Cartesian fields]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy). +Geographic and xy queries let you search fields that contain points and shapes on a map or coordinate plane. Geographic queries work on geospatial data, while xy queries work on two-dimensional coordinate data. Out of all geographic queries, the geoshape query is very similar to the xy query, but the former searches [geographic fields]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/), while the latter searches [Cartesian fields]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy). ## xy queries @@ -24,13 +24,13 @@ xy queries return documents that contain: ## Geographic queries -Geographic queries search for documents that contain geospatial geometries. These geometries can be specified in [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point) fields, which support points on a map, and [`geo_shape`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape) fields, which support points, lines, circles, and polygons. +Geographic queries search for documents that contain geospatial geometries. These geometries can be specified in [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) fields, which support points on a map, and [`geo_shape`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/) fields, which support points, lines, circles, and polygons. OpenSearch provides the following geographic query types: - [**Geo-bounding box queries**]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/geo-bounding-box/): Return documents with geopoint field values that are within a bounding box. -- **Geodistance queries** return documents with geopoints that are within a specified distance from the provided geopoint. -- **Geopolygon queries** return documents with geopoints that are within a polygon. -- **Geoshape queries** return documents that contain: - - geoshapes and geopoints that have one of four spatial relations to the provided shape: `INTERSECTS`, `DISJOINT`, `WITHIN`, or `CONTAINS`. - - geopoints that intersect the provided shape. \ No newline at end of file +- [**Geodistance queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geodistance/): Return documents with geopoints that are within a specified distance from the provided geopoint. +- **Geopolygon queries**: Return documents with geopoints that are within a polygon. +- **Geoshape queries**: Return documents that contain: + - Geoshapes and geopoints that have one of four spatial relations to the provided shape: `INTERSECTS`, `DISJOINT`, `WITHIN`, or `CONTAINS`. + - Geopoints that intersect the provided shape. \ No newline at end of file diff --git a/_query-dsl/geo-and-xy/xy.md b/_query-dsl/geo-and-xy/xy.md index 3db05c01f2..88a22448c3 100644 --- a/_query-dsl/geo-and-xy/xy.md +++ b/_query-dsl/geo-and-xy/xy.md @@ -12,13 +12,13 @@ redirect_from: # xy query -To search for documents that contain [xy point]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-point) and [xy shape]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-shape) fields, use an xy query. +To search for documents that contain [xy point]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-point/) or [xy shape]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-shape/) fields, use an xy query. ## Spatial relations When you provide an xy shape to the xy query, the xy fields are matched using the following spatial relations to the provided shape. -Relation | Description | Supporting xy Field Type +Relation | Description | Supporting xy field type :--- | :--- | :--- `INTERSECTS` | (Default) Matches documents whose xy point or xy shape intersects the shape provided in the query. | `xy_point`, `xy_shape` `DISJOINT` | Matches documents whose xy shape does not intersect with the shape provided in the query. | `xy_shape` @@ -51,6 +51,7 @@ PUT testindex } } ``` +{% include copy-curl.html %} Index a document with a point and a document with a polygon: @@ -62,7 +63,10 @@ PUT testindex/_doc/1 "coordinates": [0.5, 3.0] } } +``` +{% include copy-curl.html %} +```json PUT testindex/_doc/2 { "geometry" : { @@ -77,6 +81,7 @@ PUT testindex/_doc/2 } } ``` +{% include copy-curl.html %} Define an [`envelope`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-shape#envelope)—a bounding rectangle in the `[[minX, maxY], [maxX, minY]]` format. Search for documents with xy points or shapes that intersect that envelope: @@ -96,6 +101,7 @@ GET testindex/_search } } ``` +{% include copy-curl.html %} The following image depicts the example. Both the point and the polygon are within the bounding envelope. @@ -200,6 +206,7 @@ PUT pre-indexed-shapes } } ``` +{% include copy-curl.html %} Index an envelope that specifies the boundaries and name it `rectangle`: @@ -212,6 +219,7 @@ PUT pre-indexed-shapes/_doc/rectangle } } ``` +{% include copy-curl.html %} Index a document with a point and a document with a polygon into the index `testindex`: @@ -223,7 +231,10 @@ PUT testindex/_doc/1 "coordinates": [0.5, 3.0] } } +``` +{% include copy-curl.html %} +```json PUT testindex/_doc/2 { "geometry" : { @@ -238,6 +249,7 @@ PUT testindex/_doc/2 } } ``` +{% include copy-curl.html %} Search for documents with shapes that intersect `rectangle` in the index `testindex` using a filter: @@ -261,6 +273,7 @@ GET testindex/_search } } ``` +{% include copy-curl.html %} The preceding query uses the default spatial relation `INTERSECTS` and returns both the point and the polygon: @@ -352,6 +365,7 @@ PUT testindex1 } } ``` +{% include copy-curl.html %} Index three points: @@ -360,17 +374,24 @@ PUT testindex1/_doc/1 { "point": "1.0, 1.0" } +``` +{% include copy-curl.html %} +```json PUT testindex1/_doc/2 { "point": "2.0, 0.0" } +``` +{% include copy-curl.html %} +```json PUT testindex1/_doc/3 { "point": "-2.0, 2.0" } ``` +{% include copy-curl.html %} Search for points that lie within the circle with the center at (0, 0) and a radius of 2: @@ -390,6 +411,7 @@ GET testindex1/_search } } ``` +{% include copy-curl.html %} xy point only supports the default `INTERSECTS` spatial relation, so you don't need to provide the `relation` parameter. {: .note} diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 4bc0e5ca76..9c0e2da7c6 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -82,6 +82,14 @@ The `search.concurrent.max_slice_count` setting can take the following valid val - `0`: Use the default Lucene mechanism. - Positive integer: Use the max target slice count mechanism. Usually, a value between 2 and 8 should be sufficient. +## General guidelines +Concurrent segment search helps to improve the performance of search requests at the cost of consuming more resources, such as CPU or JVM heap. It is important to test your workload in order to understand whether the cluster is sized correctly for concurrent segment search. We recommend adhering to the following concurrent segment search guidelines: + +* Start with a slice count of 2 and measure the performance of your workload. If resource utilization exceeds the recommended values, then consider scaling your cluster. Based on our testing, we have observed that if your workload is already consuming more than 50% of your CPU resources, then you need to scale your cluster for concurrent segment search. +* If your slice count is 2 and you still have available resources in the cluster, then you can increase the slice count to a higher number, such as 4 or 6, while monitoring search latency and resource utilization in the cluster. +* When many clients send search requests in parallel, a lower slice count usually works better. This is reflected in CPU utilization because a higher number of clients leads to more queries per second, which translates to higher resource usage. + + ## Limitations The following aggregations do not support the concurrent search model. If a search request contains one of these aggregations, the request will be executed using the non-concurrent path even if concurrent segment search is enabled at the cluster level or index level. diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 5e53cf5615..4630ab950c 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -121,3 +121,7 @@ The response contains the `search_pipelines` object that lists the available req In addition to the processors provided by OpenSearch, additional processors may be provided by plugins. {: .note} + +## Selectively enabling processors + +Processors defined by the [search-pipeline-common module](https://github.com/opensearch-project/OpenSearch/blob/2.x/modules/search-pipeline-common/src/main/java/org/opensearch/search/pipeline/common/SearchPipelineCommonModulePlugin.java) are selectively enabled through the following cluster settings: `search.pipeline.common.request.processors.allowed`, `search.pipeline.common.response.processors.allowed`, or `search.pipeline.common.search.phase.results.processors.allowed`. If unspecified, then all processors are enabled. An empty list disables all processors. Removing enabled processors causes pipelines using them to fail after a node restart. \ No newline at end of file diff --git a/_search-plugins/search-relevance/compare-search-results.md b/_search-plugins/search-relevance/compare-search-results.md index 9e34b7cfd7..962442cd31 100644 --- a/_search-plugins/search-relevance/compare-search-results.md +++ b/_search-plugins/search-relevance/compare-search-results.md @@ -3,7 +3,7 @@ layout: default title: Comparing search results nav_order: 55 parent: Search relevance -has_children: true +has_children: false has_toc: false redirect_from: - /search-plugins/search-relevance/ diff --git a/_search-plugins/search-relevance/reranking-search-results.md b/_search-plugins/search-relevance/reranking-search-results.md index 14c418020d..4b4deaeb92 100644 --- a/_search-plugins/search-relevance/reranking-search-results.md +++ b/_search-plugins/search-relevance/reranking-search-results.md @@ -115,4 +115,19 @@ POST /my-index/_search ``` {% include copy-curl.html %} -Alternatively, you can provide the full path to the field containing the context. For more information, see [Rerank processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#example). \ No newline at end of file +Alternatively, you can provide the full path to the field containing the context. For more information, see [Rerank processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#example). + +## Using rerank and normalization processors together + +When you use a rerank processor in conjunction with a [normalization processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) and a hybrid query, the rerank processor alters the final document scores. This is because the rerank processor operates after the normalization processor in the search pipeline. +{: .note} + +The processing order is as follows: + +- Normalization processor: This processor normalizes the document scores based on the configured normalization method. For more information, see [Normalization processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/). +- Rerank processor: Following normalization, the rerank processor further adjusts the document scores. This adjustment can significantly impact the final ordering of search results. + +This processing order has the following implications: + +- Score modification: The rerank processor modifies the scores that were initially adjusted by the normalization processor, potentially leading to different ranking results than initially expected. +- Hybrid queries: In the context of hybrid queries, where multiple types of queries and scoring mechanisms are combined, this behavior is particularly noteworthy. The combined scores from the initial query are normalized first and then reranked, resulting in a two-stage scoring modification. \ No newline at end of file diff --git a/_search-plugins/sql/ppl/syntax.md b/_search-plugins/sql/ppl/syntax.md index 45eeb3aed2..22d6beaf26 100644 --- a/_search-plugins/sql/ppl/syntax.md +++ b/_search-plugins/sql/ppl/syntax.md @@ -8,10 +8,12 @@ nav_order: 1 # PPL syntax -Every PPL query starts with the `search` command. It specifies the index to search and retrieve documents from. Subsequent commands can follow in any order. +Every PPL query starts with the `search` command. It specifies the index to search and retrieve documents from. + +`PPL` supports exactly one `search` command per PPL query, and it is always the first command. The word `search` can be omitted. + +Subsequent commands can follow in any order. -Currently, `PPL` supports only one `search` command, which can be omitted to simplify the query. -{ : .note} ## Syntax @@ -22,8 +24,7 @@ source= [boolean-expression] Field | Description | Required :--- | :--- |:--- -`search` | Specifies search keywords. | Yes -`index` | Specifies which index to query from. | No +`index` | Specifies the index to query. | No `bool-expression` | Specifies an expression that evaluates to a Boolean value. | No ## Examples diff --git a/_security/authentication-backends/jwt.md b/_security/authentication-backends/jwt.md index afcd4c78ee..3f28dfecfd 100644 --- a/_security/authentication-backends/jwt.md +++ b/_security/authentication-backends/jwt.md @@ -122,7 +122,7 @@ Name | Description `jwt_url_parameter` | If the token is not transmitted in the HTTP header but rather as an URL parameter, define the name of the parameter here. `subject_key` | The key in the JSON payload that stores the username. If not set, the [subject](https://tools.ietf.org/html/rfc7519#section-4.1.2) registered claim is used. `roles_key` | The key in the JSON payload that stores the user's roles. The value of this key must be a comma-separated list of roles. -`required_audience` | The name of the audience which the JWT must specify. This corresponds [`aud` claim of the JWT](https://datatracker.ietf.org/doc/html/rfc7519#section-4.1.3). +`required_audience` | The name of the audience that the JWT must specify. You can set a single value (for example, `project1`) or multiple comma-separated values (for example, `project1,admin`). If you set multiple values, the JWT must have at least one required audience. This parameter corresponds to the [`aud` claim of the JWT](https://datatracker.ietf.org/doc/html/rfc7519#section-4.1.3). `required_issuer` | The target issuer of JWT stored in the JSON payload. This corresponds to the [`iss` claim of the JWT](https://datatracker.ietf.org/doc/html/rfc7519#section-4.1.1). `jwt_clock_skew_tolerance_seconds` | Sets a window of time, in seconds, to compensate for any disparity between the JWT authentication server and OpenSearch node clock times, thereby preventing authentication failures due to the misalignment. Security sets 30 seconds as the default. Use this setting to apply a custom value. @@ -235,7 +235,7 @@ openid_auth_domain: transport_enabled: true order: 0 http_authenticator: - type: openid + type: openid # use the OpenID Connect domain, since JWT is part of this authentication. challenge: false config: subject_key: preferred_username diff --git a/_security/authentication-backends/openid-connect.md b/_security/authentication-backends/openid-connect.md index 8fc29e262f..8efb66fbb6 100755 --- a/_security/authentication-backends/openid-connect.md +++ b/_security/authentication-backends/openid-connect.md @@ -413,3 +413,21 @@ config: authentication_backend: type: noop ``` + +## Docker example with Keycloak + +The following steps use Docker and [Keycloak IdP](https://www.keycloak.org/) to set up a basic authentication backend: + + +1. Download and unzip the [example OpenID Connect zip file]({{site.url}}{{site.baseurl}}/assets/examples/oidc_example.zip) +2. Update the `.env` file with a strong password for the `admin` user. +3. Substitute the `{IP}` placeholders in `config.yml` and `opensearch_dashboards.yml` with the IP of the local machine. +4. Review the following files: + - `docker-compose.yml` defines a single OpenSearch node, OpenSearch Dashboards, and Keycloak server. + - `new-realm.json` specifies the details of the [realm](https://www.keycloak.org/docs/latest/server_admin/#core-concepts-and-terms). In this example, the realm is named `new`. + - `config.yml` configures `basic_internal_auth_domain` and `oidc_auth_domain`. + - `opensearch_dashboards.yml` should point to Keycloak for authentication. Make sure that the `opensearch_security.openid.connect_url` setting points to the URL of the realm. +5. At the command line, run `docker-compose up`. +6. Access OpenSearch Dashboards at `http://localhost:5601` and log in with username `testuser` and password `testpassword` configured in the `new-realm.json` file. + +After logging in, the `testuser` receives the backend role `admin` from Keycloak, which is mapped to the `all_access` OpenSearch role. These backend roles can be managed using the Keycloak Administrative Console at http://localhost:8080, using username `admin` and password `admin`. diff --git a/_security/configuration/configuration.md b/_security/configuration/configuration.md index d4f6a47cde..2a038b7fb9 100755 --- a/_security/configuration/configuration.md +++ b/_security/configuration/configuration.md @@ -11,7 +11,7 @@ redirect_from: One of the first steps when setting up the Security plugin is deciding which authentication backend to use. The role played by the backend in authentication is covered in [steps 2 and 3 of the authentication flow]({{site.url}}{{site.baseurl}}/security/authentication-backends/authc-index/#authentication-flow). The plugin has an internal user database, but many people prefer to use an existing authentication backend, such as an LDAP server, or some combination of the two. -The primary file used to configure an authentication and authorization backend is `config/opensearch-security/config.yml`. This file defines how the Security plugin retrieves user credentials, how it verifies the credentials, and how it fetches additional roles when the backend selected for authentication and authorization supports this feature. This topic provides a basic overview of the configuration file and its requirements for setting up security. For information about configuring a specific backend, see [Authentication backends]({{site.url}}{{site.baseurl}}/security/authentication-backends/authc-index/). +The primary file used to configure the authentication and authorization backend is `/usr/share/opensearch/config/opensearch-security/config.yml`. This file defines how the Security plugin retrieves user credentials, how the plugin verifies the credentials, and how the plugin fetches additional roles when the backend selected for authentication and authorization supports this feature. This topic provides a basic overview of the configuration file and its requirements for setting up security. For information about configuring a specific backend, see [Authentication backends]({{site.url}}{{site.baseurl}}/security/authentication-backends/authc-index/). The `config.yml` file includes three main parts: diff --git a/assets/examples/oidc_example.zip b/assets/examples/oidc_example.zip new file mode 100644 index 0000000000..e2d3cbf951 Binary files /dev/null and b/assets/examples/oidc_example.zip differ diff --git a/images/dashboards/Add_datasource.gif b/images/dashboards/Add_datasource.gif new file mode 100644 index 0000000000..789e1a2128 Binary files /dev/null and b/images/dashboards/Add_datasource.gif differ diff --git a/images/dashboards/add-sample-data.gif b/images/dashboards/add-sample-data.gif new file mode 100644 index 0000000000..6e569d704d Binary files /dev/null and b/images/dashboards/add-sample-data.gif differ diff --git a/images/dashboards/configure-tsvb.gif b/images/dashboards/configure-tsvb.gif new file mode 100644 index 0000000000..fc91e9e669 Binary files /dev/null and b/images/dashboards/configure-tsvb.gif differ diff --git a/images/dashboards/configure-vega.gif b/images/dashboards/configure-vega.gif new file mode 100644 index 0000000000..290ad51416 Binary files /dev/null and b/images/dashboards/configure-vega.gif differ diff --git a/images/dashboards/create-datasource.gif b/images/dashboards/create-datasource.gif new file mode 100644 index 0000000000..789e1a2128 Binary files /dev/null and b/images/dashboards/create-datasource.gif differ diff --git a/images/dashboards/make_tsvb.gif b/images/dashboards/make_tsvb.gif new file mode 100644 index 0000000000..fc91e9e669 Binary files /dev/null and b/images/dashboards/make_tsvb.gif differ diff --git a/images/dashboards/tsvb-viz.png b/images/dashboards/tsvb-viz.png new file mode 100644 index 0000000000..efdf12484c Binary files /dev/null and b/images/dashboards/tsvb-viz.png differ diff --git a/images/dashboards/tsvb-with-annotations.png b/images/dashboards/tsvb-with-annotations.png new file mode 100644 index 0000000000..6cb35632b8 Binary files /dev/null and b/images/dashboards/tsvb-with-annotations.png differ diff --git a/images/dashboards/tsvb.png b/images/dashboards/tsvb.png new file mode 100644 index 0000000000..85f55cc3ad Binary files /dev/null and b/images/dashboards/tsvb.png differ diff --git a/images/make_vega.gif b/images/make_vega.gif new file mode 100644 index 0000000000..290ad51416 Binary files /dev/null and b/images/make_vega.gif differ diff --git a/images/vega.png b/images/vega.png new file mode 100644 index 0000000000..ae7ea76c9d Binary files /dev/null and b/images/vega.png differ diff --git a/release-notes/opensearch-documentation-release-notes-2.15.0.md b/release-notes/opensearch-documentation-release-notes-2.15.0.md new file mode 100644 index 0000000000..5f7ab9b049 --- /dev/null +++ b/release-notes/opensearch-documentation-release-notes-2.15.0.md @@ -0,0 +1,42 @@ +# OpenSearch Documentation Website 2.15.0 Release Notes + +The OpenSearch 2.15.0 documentation includes the following additions and updates. + +## New documentation for 2.15.0 + +- Alerts in correlations feature documentation [#7410](https://github.com/opensearch-project/documentation-website/pull/7410) +- Add documentations for batch ingestion feature [#7408](https://github.com/opensearch-project/documentation-website/pull/7408) +- Changed VisBuilder status from experimental to GA [#7405](https://github.com/opensearch-project/documentation-website/pull/7405) +- Add documentation for innerHit on knn nested field [#7404](https://github.com/opensearch-project/documentation-website/pull/7404) +- AD Enhancements in Version 2.15 [#7388](https://github.com/opensearch-project/documentation-website/pull/7388) +- Add connector tool [#7384](https://github.com/opensearch-project/documentation-website/pull/7384) +- Add remote guardrails model support [#7377](https://github.com/opensearch-project/documentation-website/pull/7377) +- Update documentation of ml inference processors to support for local models [#7368](https://github.com/opensearch-project/documentation-website/pull/7368) +- Trace analytics update [#7362](https://github.com/opensearch-project/documentation-website/pull/7362) +- Add doc for alerting comments [#7360](https://github.com/opensearch-project/documentation-website/pull/7360) +- Add documentation related to removal of source and recovery source in k-NN performance tuning section [#7359](https://github.com/opensearch-project/documentation-website/pull/7359) +- Added documentation for new default workflow templates [#7346](https://github.com/opensearch-project/documentation-website/pull/7346) +- Mark docrep to remote migration as GA and modify settings names [#7342](https://github.com/opensearch-project/documentation-website/pull/7342) +- Add documentation for the new setting of cardinality aggregation dynamic pruning [#7341](https://github.com/opensearch-project/documentation-website/pull/7341) +- Add documentation for wildcard field type [#7339](https://github.com/opensearch-project/documentation-website/pull/7339) +- Update document for handle SageMaker throttling [#7331](https://github.com/opensearch-project/documentation-website/pull/7331) +- Add documentation related to new settings for segment upload timeout [#7330](https://github.com/opensearch-project/documentation-website/pull/7330) +- Add documentation of derived fields [#7329](https://github.com/opensearch-project/documentation-website/pull/7329) +- [MDS] Add security analytics, alerting, feature anaywhere in the multiple data source document [#7328](https://github.com/opensearch-project/documentation-website/pull/7328) +- Add document for top n queries improvements in 2.15 [#7326](https://github.com/opensearch-project/documentation-website/pull/7326) +- Update the integration page to reflect new integration catalog features [#7324](https://github.com/opensearch-project/documentation-website/pull/7324) +- Add doc for neural-sparse-query-two-phase-processor. [#7306](https://github.com/opensearch-project/documentation-website/pull/7306) +- Add documentation for Indices Request Cache Overview and its settings [#7288](https://github.com/opensearch-project/documentation-website/pull/7288) +- Added documentation for Reindex workflow step [#7271](https://github.com/opensearch-project/documentation-website/pull/7271) +- Document optional clear_status query parameter for Delete Workflow API [#7268](https://github.com/opensearch-project/documentation-website/pull/7268) +- Update field-masking.md. Configure default masking algorithm. [#7162](https://github.com/opensearch-project/documentation-website/pull/7162) +- add documentation for use compound file setting [#7092](https://github.com/opensearch-project/documentation-website/pull/7092) +- Added documentation for managed identity support in repository-azure plugin [#7068](https://github.com/opensearch-project/documentation-website/pull/7068) + +## In progress documentation for 2.15.0 + +- Initial UBI documentation [#7284](https://github.com/opensearch-project/documentation-website/pull/7284) + +## Documentation for 2.15.0 experimental features + +- Add remote state publication [#7364](https://github.com/opensearch-project/documentation-website/pull/7364)