From 8c00d4e31766214ef2a93a9905f94874bb8ecae0 Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Fri, 19 Apr 2024 07:05:43 -0700 Subject: [PATCH 01/12] Schema & documentation working updates --- documentation/schemas.md | 99 +++++++++++++++++ src/main/resources/events-mapping.json | 145 ++++++------------------- 2 files changed, 131 insertions(+), 113 deletions(-) create mode 100644 documentation/schemas.md diff --git a/documentation/schemas.md b/documentation/schemas.md new file mode 100644 index 0000000..cfcb10c --- /dev/null +++ b/documentation/schemas.md @@ -0,0 +1,99 @@ + +# Key UBI concepts + +Although the named fields below follow a schema lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need. + +[`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`. + +The purpose of the [`query_id`](#query_id)'s help link the user's raw query string to the results, as well as any subsequent action that the UBI client logs. +When UBI is turned on, a *search client* will get a `query_id` back from OpenSearch, and is passed to the UBI client. The UBI client then associates each subsequent event with this query until it receives a new query_id. + +[`action_name`](#action_name) says what the name of the event is. It can be any name, such as *login*, *logout*, *save*, *post*, *add_to_cart*... + + [`event_attributes`](#event_attributes)'s is where any relevant information about the event can be stored. + The two primary, predefined objects in the attributes are [`event_attributes.position`](#position), which contains + information on what part of the application the user is interacting with, + and [`event_attributes.object`](#object), which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..). + +# TODO: `key_field` rename? +The `object` structure has two ways to refer to the object: +- `event_attributes.object.object_id` is the unique id that OpenSearch can use internally to index the object. +- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catelog* + + + Therefore, the `query_id` signals the beginning of a user's *Search Journey*, +`action_name` tells us how the user is interacting with the query results within the application, +and `event_attributes.object` is referring to the precise query result that the user interacts with. + +### OpenSearch Data Mappings + +#### Schema for events: + +The current event mappings file can be found [here](../src/main/resources/events-mapping.json). + +**Primary fields include:** +- `application`

+   (size 100) - name of application tracking UBI events +- `action_name`

+   (size 100) - any name you want to call your event +- `timestamp`: \ +   Unix epoch time. If not set , will be set by the plugin when the event is received +- `query_id`

+   (size 100) - ID for some query. Note that it could be a unique search string, or it could represent a cluster of related searches (i.e.: *dress*, *red dress*, *long dress* could all have the same `query_id`). Either the client could control these, or the `query_id` could be retrieved from the API's response headers as it keeps track of queries on the node +- `user_id`. `session_id`, `source_id`

+   (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event. + The `user_id` must be consistent in both the `query` and `event` stores. +- `message_type` \ +   (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`. + Can be used to group `action_name` together in logical bins. + +- `message` \ +   (size 256) - optional text for the log entry + +**Other attribute fields & data objects**

+- `event_attributes.object` \ +   represents the search result object (i.e. books, products, user info, etc) if there are any + - `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object + - `event_attributes.object.catalog_id` \ +   points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.). + **This field value should match the value in for the object's value in the `catalog_field` [below](#catalog_field) from the search store** + It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users. + - `event_attributes.object.object_type` \ +   indicates the type/class of object + - `event_attributes.object.description` \ +   optional description of the object + - `event_attributes.object.transaction_id` \ +   optionally points to a unique id representing a successful transaction + - `event_attributes.object.to_user_id` \ +   optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id` + - `event_attributes.object.object_detail` \ +   optional data object/map of further data details +- `event_attributes.position` \ +   nested object to track user events to the location of the event origins + - `event_attributes.position.ordinal` \ +   tracks the nth item within a list that a user could select, click + - `event_attributes.position.{x,y}` \ +   tracks x and y values, that the client defines + - `event_attributes.position.page_depth` \ +   tracks page depth + - `event_attributes.position.scroll_depth` \ +   tracks scroll depth + - `event_attributes.position.trail` \ +   text field for tracking the path/trail that a user took to get to this location + +* Note the developers can add optional, dynamic fields like `user_name`, `email`, `price` per individual use-cases. + +#### Schema for queries: + +The current query mappings file can be found [here](../src/main/resources/queries-mapping.json). + +- `timestamp` \ +   A unix timestamp of when the query was received +- `query_id` \ +   A unique ID of the query provided by the client or generated automatically +- `query_response_id` \ +   A unique ID for the collection of results for the query +- `user_id` \ +   A user ID provided by the client +- `session_id` \ +   An optional session ID provided by the client diff --git a/src/main/resources/events-mapping.json b/src/main/resources/events-mapping.json index fd6d60c..ac7e1e1 100644 --- a/src/main/resources/events-mapping.json +++ b/src/main/resources/events-mapping.json @@ -1,130 +1,49 @@ { "properties": { - "action_name": { - "type": "keyword", - "ignore_above": 100 - }, - "user_id": { - "type": "keyword", - "ignore_above": 100 - }, - "session_id": { - "type": "keyword", - "ignore_above": 100 - }, - "query_id": { - "type": "keyword", - "ignore_above": 100 - }, - "page_id": { - "type": "keyword", - "ignore_above": 256 - }, - "message": { - "type": "keyword", - "ignore_above": 256 - }, - "message_type": { - "type": "keyword", - "ignore_above": 100 - }, - "timestamp": { - "type": "date", - "doc_values": true - }, + "application": { "type": "keyword", "ignore_above": 100 }, + "action_name": { "type": "keyword", "ignore_above": 100 }, + "query_id": { "type": "keyword", "ignore_above": 100 }, + "user_id": { "type": "keyword", "ignore_above": 100 }, + "session_id": { "type": "keyword", "ignore_above": 100 }, + "source_id": { "type": "keyword", "ignore_above": 256 }, + "message": { "type": "keyword", "ignore_above": 256 }, + "message_type": { "type": "keyword", "ignore_above": 100 }, + "timestamp": { "type": "date", "doc_values": true }, "event_attributes": { "properties": { - "user_name": { - "type": "keyword", - "ignore_above": 256 - }, - "user_id": { - "type": "keyword", - "ignore_above": 100 - }, - "email": { - "type": "keyword" - }, - "price": { - "type": "float" - }, - "ip": { - "type": "ip", - "ignore_malformed": true - }, - "browser": { - "type": "text", - "fields": { - "keyword": { - "type": "keyword", - "ignore_above": 256 - } - } + "ip": { "type": "ip", "ignore_malformed": true }, + "browser": { "type": "text", + "fields": + { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "position": { "properties": { - "ordinal": { - "type": "integer" - }, - "x": { - "type": "integer" - }, - "y": { - "type": "integer" - }, - "page_depth": { - "type": "integer" - }, - "scroll_depth": { - "type": "integer" - }, - "trail": { - "type": "text", - "fields": { - "keyword": { - "type": "keyword", - "ignore_above": 256 - } - } + "ordinal": { "type": "integer" }, + "x": { "type": "integer" }, + "y": { "type": "integer" }, + "page_depth": { "type": "integer" }, + "scroll_depth": { "type": "integer" }, + "trail": { "type": "text", + "fields": + { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "object": { "properties": { - "key_value": { - "type": "keyword" - }, - "object_id": { - "type": "keyword", - "ignore_above": 256 - }, - "object_type": { - "type": "keyword", - "ignore_above": 100 - }, - "transaction_id": { - "type": "keyword", - "ignore_above": 100 - }, - "name": { - "type": "keyword", - "ignore_above": 256 - }, - "description": { - "type": "text", + "catalog_id": { "type": "keyword" }, + "object_id": { "type": "keyword", "ignore_above": 256 }, + "object_type": { "type": "keyword", "ignore_above": 100 }, + "transaction_id": { "type": "keyword", "ignore_above": 100 }, + "name": { "type": "keyword", "ignore_above": 256 }, + "description": { "type": "text", "fields": { - "keyword": { - "type": "keyword", - "ignore_above": 256 - } - } - }, - "to_user_id": { - "type": "keyword", - "ignore_above": 100 + "keyword": { "type": "keyword", "ignore_above": 256 } } }, - "object_detail": { - "type": "object" + "to_user_id": { "type": "keyword", "ignore_above": 100 }, + "object_details": { "type": "text", + "fields": + { "json": { "type": "object"} } } } } From 48fa7198c092d2c827c280d6f103b57bf00678cf Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Fri, 19 Apr 2024 07:13:45 -0700 Subject: [PATCH 02/12] forgot to mention the new object.object_details.json field. --- documentation/schemas.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index cfcb10c..2c506e2 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -18,8 +18,7 @@ When UBI is turned on, a *search client* will get a `query_id` back from OpenSea # TODO: `key_field` rename? The `object` structure has two ways to refer to the object: - `event_attributes.object.object_id` is the unique id that OpenSearch can use internally to index the object. -- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catelog* - +- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog* Therefore, the `query_id` signals the beginning of a user's *Search Journey*, `action_name` tells us how the user is interacting with the query results within the application, @@ -67,7 +66,9 @@ The current event mappings file can be found [here](../src/main/resources/events - `event_attributes.object.to_user_id` \   optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id` - `event_attributes.object.object_detail` \ -   optional data object/map of further data details +   optional text for further data object details + - `event_attributes.object.object_detail.json` \ +   if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large. - `event_attributes.position` \   nested object to track user events to the location of the event origins - `event_attributes.position.ordinal` \ From 168649b068daa6b2c6924b9ecf8176920b745dea Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Fri, 19 Apr 2024 16:42:41 -0400 Subject: [PATCH 03/12] track some changes based on team discussions --- documentation/schemas.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index 2c506e2..89fe88f 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -1,7 +1,7 @@ # Key UBI concepts -Although the named fields below follow a schema lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need. +Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need. [`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`. @@ -17,7 +17,7 @@ When UBI is turned on, a *search client* will get a `query_id` back from OpenSea # TODO: `key_field` rename? The `object` structure has two ways to refer to the object: -- `event_attributes.object.object_id` is the unique id that OpenSearch can use internally to index the object. +- `event_attributes.object.object_id` is the unique id that OpenSearch uses internally to index the object, think the `_id` field in the indices. - `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog* Therefore, the `query_id` signals the beginning of a user's *Search Journey*, @@ -38,7 +38,7 @@ The current event mappings file can be found [here](../src/main/resources/events - `timestamp`: \   Unix epoch time. If not set , will be set by the plugin when the event is received - `query_id`

-   (size 100) - ID for some query. Note that it could be a unique search string, or it could represent a cluster of related searches (i.e.: *dress*, *red dress*, *long dress* could all have the same `query_id`). Either the client could control these, or the `query_id` could be retrieved from the API's response headers as it keeps track of queries on the node +   (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated by the server. - `user_id`. `session_id`, `source_id`

  (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event. The `user_id` must be consistent in both the `query` and `event` stores. @@ -91,9 +91,9 @@ The current query mappings file can be found [here](../src/main/resources/querie - `timestamp` \   A unix timestamp of when the query was received - `query_id` \ -   A unique ID of the query provided by the client or generated automatically -- `query_response_id` \ -   A unique ID for the collection of results for the query +   A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`. +- `query_response_objects_ids` \ +   This is an array of the `object_id`. The size - `user_id` \   A user ID provided by the client - `session_id` \ From 5f61fcdc3c2300b9973ae93771e63e434619ed39 Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sat, 20 Apr 2024 11:14:19 -0700 Subject: [PATCH 04/12] Updating new schema tweaks + a new diagram --- documentation/documentation.md | 6 +- documentation/queries/sql_queries.md | 6 +- documentation/schemas.md | 155 ++++++++++++++---- .../o19s/ubi/UserBehaviorInsightsPlugin.java | 2 +- .../UserBehaviorInsightsActionFilter.java | 8 +- .../o19s/ubi/data/OpenSearchDataManager.java | 2 +- .../com/o19s/ubi/model/SettingsConstants.java | 2 +- .../rest/UserBehaviorInsightsRestHandler.java | 4 +- src/main/resources/events-mapping.json | 4 +- src/main/resources/queries-mapping.json | 2 +- .../rest-api-spec/api/ubi.create_store.json | 2 +- .../test/_plugins.ubi/20_manage_stores.yml | 12 +- 12 files changed, 150 insertions(+), 55 deletions(-) diff --git a/documentation/documentation.md b/documentation/documentation.md index 2bb18ce..49b6a16 100644 --- a/documentation/documentation.md +++ b/documentation/documentation.md @@ -27,7 +27,7 @@ docker compose -f docker-compose-cluster.yaml up Initialize the `awesome` UBI store: ``` -curl -X PUT "http://localhost:9200/_plugins/ubi/awesome?index=ecommerce&id_field=id" +curl -X PUT "http://localhost:9200/_plugins/ubi/awesome?index=ecommerce&object_id=id" ``` Send an event to the `awesome` store: @@ -88,7 +88,7 @@ The current event mappings file can be found [here](https://github.com/o19s/open - `event_attributes.object` - contains an associated JSONified data object (i.e. books, products, user info, etc) if there are any - `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object - `event_attributes.object.key_value` - points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.). - **This field value should match the value in for the object's value in the `id_field` [below](#id_field) from the search store** + **This field value should match the value in for the object's value in the `object_id` [below](#object_id) from the search store** It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users. - `event_attributes.object.object_type` - indicates the type/class of object - `event_attributes.object.description` - optional description of the object @@ -122,7 +122,7 @@ The plugin exposes a REST API for managing UBI stores and persisting events. | Method | Endpoint | Purpose | |--------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `PUT` | `/_plugins/ubi/{store}?index={index}&id_field={id_field}` |

Initialize a new UBI store for the given index. The `id_field` is optional and allows for providing the name of a field in the `index`'s schema to be used as the unique result/item ID for each search result. If not provided, the `_id` field is used.

| +| `PUT` | `/_plugins/ubi/{store}?index={index}&object_id={object_id}` |

Initialize a new UBI store for the given index. The `object_id` is optional and allows for providing the name of a field in the `index`'s schema to be used as the unique result/item ID for each search result. If not provided, the `_id` field is used.

| | `DELETE` | `/_plugins/ubi/{store}` | Delete a UBI store | | `GET` | `/_plugins/ubi` | Get a list of all UBI stores | | `POST` | `/_plugins/ubi/{store}` | Index an event into the UBI store | diff --git a/documentation/queries/sql_queries.md b/documentation/queries/sql_queries.md index a2dec00..68096c8 100644 --- a/documentation/queries/sql_queries.md +++ b/documentation/queries/sql_queries.md @@ -9,7 +9,7 @@ Although it's trivial on the server side to find queries with no results, we can select count(0) from .ubi_log_queries -where query_response_hit_ids is null +where query_response_objects_ids is null order by user_id ``` @@ -18,7 +18,7 @@ order by user_id select count(0) from .ubi_log_events -where action_name='on_search' and event_attributes.data.data_detail.query_data.query_response_hit_ids is null +where action_name='on_search' and event_attributes.data.data_detail.query_data.query_response_objects_ids is null order by timestamp ``` @@ -113,7 +113,7 @@ where query_id ='1065c70f-d46a-442f-8ce4-0b5e7a71a892' order by timestamp ``` (In this generated data, the `query` field is plain text; however in the real implementation the query will be in the internal DSL of the query and parameters.) -query_response_id|query_id|user_id|query|query_response_hit_ids|session_id|timestamp +query_response_id|query_id|user_id|query|query_response_objects_ids|session_id|timestamp ---|---|---|---|---|---|--- 1065c70f-d46a-442f-8ce4-0b5e7a71a892|1065c70f-d46a-442f-8ce4-0b5e7a71a892|155_7e3471ff-14c8-45cb-bc49-83a056c37192|Blanditiis quo sint repudiandae a sit.|8659955|fa6e3b1c-3212-44d2-b16b-690b4aeddbba_1975|2027-04-17 10:16:45 diff --git a/documentation/schemas.md b/documentation/schemas.md index 89fe88f..12aa1ad 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -1,5 +1,60 @@ # Key UBI concepts +## Ubi Roles +- **User Behavior Insights** module: once activated, is in charge of indexing a user's queries and results in the **query store** with a unique [`query_id`](#query_id), and passing that `query_id` back to the search client. + +- **Search Client**: in charge of searching and recieving the `query_id` from **User Behavior Insights**. This `query_id` is then passed to the **Ubi Logging Client** + +- **Ubi Logging Client**: is in charge of indexing user events, such as onClick, in the **event store** along with the `query_id` that links to the underlying, technical query DSL and the results' `object_id`'s. + +*Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. + +```mermaid +%%{init: { + "flowchart": {"htmlLabels": false}, + + } +}%% +graph TB + + style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5 +subgraph OS[OpenSearch Cluster fa:fa-database] + E[( Ubi Events )] + Docs[(Document Index)] --3) DSL & object_id's--> Q[( Ubi Queries )]; + Q -."4) query_id".-> Docs ; + + end + style *client-side* stroke-width:2px, stroke:#EC6363 +subgraph "`*client-side*`" + style User stroke-width:4px, stroke:#EC636 + User["`*User*`" fa:fa-user] + App + Search + U + style App fill:#EC6363,opacity:.5 + subgraph App[UserApp fa:fa-store] + Search( Search Client ) + U( Ubi Client ) + end + User--1) raw search string-->Search; + +end + +Search--2) search string-->Docs + +Docs -. 6) query_id & objects...->Search ; +Search --results--> User +Search-.7) query_id.->U; + +User--8) selects + object_id:123-->U; +U-."9) index event:{query_id, onClick, object_id:123}".->E; + +linkStyle 3,0,5 stroke-width:2px,fill:none,stroke:#0A1CCF +linkStyle 1,4,6,8 stroke-width:2px,fill:none,stroke:red +``` + + Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need. @@ -15,7 +70,6 @@ When UBI is turned on, a *search client* will get a `query_id` back from OpenSea information on what part of the application the user is interacting with, and [`event_attributes.object`](#object), which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..). -# TODO: `key_field` rename? The `object` structure has two ways to refer to the object: - `event_attributes.object.object_id` is the unique id that OpenSearch uses internally to index the object, think the `_id` field in the indices. - `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog* @@ -31,55 +85,87 @@ and `event_attributes.object` is referring to the precise query result that the The current event mappings file can be found [here](../src/main/resources/events-mapping.json). **Primary fields include:** -- `application`

+- `application` +

  (size 100) - name of application tracking UBI events -- `action_name`

+- `action_name` +

  (size 100) - any name you want to call your event -- `timestamp`: \ +- `timestamp`: +   Unix epoch time. If not set , will be set by the plugin when the event is received -- `query_id`

+- `query_id` +

  (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated by the server. - `user_id`. `session_id`, `source_id`

  (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event. The `user_id` must be consistent in both the `query` and `event` stores. -- `message_type` \ +- `message_type` +   (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`. Can be used to group `action_name` together in logical bins. -- `message` \ +- `message` +   (size 256) - optional text for the log entry **Other attribute fields & data objects**

-- `event_attributes.object` \ +- `event_attributes.object` +   represents the search result object (i.e. books, products, user info, etc) if there are any - - `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object - - `event_attributes.object.catalog_id` \ + + - `event_attributes.object.internal_id` - points to a unique, internal, id representing and instance of that object + + - `event_attributes.object.object_id` +

  points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.). - **This field value should match the value in for the object's value in the `catalog_field` [below](#catalog_field) from the search store** - It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users. - - `event_attributes.object.object_type` \ + **This field value should match the value in for the object's value in the `Object_id` [below](#object_id) from the search store** + It is possible that the `object_id` and `internal_id` match if the same id is used both internally for indexing and externally for the users. + + - `event_attributes.object.object_type` +   indicates the type/class of object - - `event_attributes.object.description` \ + + - `event_attributes.object.description` +   optional description of the object - - `event_attributes.object.transaction_id` \ + + - `event_attributes.object.transaction_id` +   optionally points to a unique id representing a successful transaction - - `event_attributes.object.to_user_id` \ + + - `event_attributes.object.to_user_id` +   optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id` - - `event_attributes.object.object_detail` \ + - `event_attributes.object.object_detail` +   optional text for further data object details - - `event_attributes.object.object_detail.json` \ + + - `event_attributes.object.object_detail.json` +   if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large. -- `event_attributes.position` \ + +- `event_attributes.position` +   nested object to track user events to the location of the event origins - - `event_attributes.position.ordinal` \ + - `event_attributes.position.ordinal` +   tracks the nth item within a list that a user could select, click - - `event_attributes.position.{x,y}` \ + + - `event_attributes.position.{x,y}` +   tracks x and y values, that the client defines - - `event_attributes.position.page_depth` \ + + - `event_attributes.position.page_depth` +   tracks page depth - - `event_attributes.position.scroll_depth` \ + + - `event_attributes.position.scroll_depth` +   tracks scroll depth - - `event_attributes.position.trail` \ + + - `event_attributes.position.trail` +   text field for tracking the path/trail that a user took to get to this location * Note the developers can add optional, dynamic fields like `user_name`, `email`, `price` per individual use-cases. @@ -88,13 +174,22 @@ The current event mappings file can be found [here](../src/main/resources/events The current query mappings file can be found [here](../src/main/resources/queries-mapping.json). -- `timestamp` \ +- `timestamp` +   A unix timestamp of when the query was received -- `query_id` \ + +- `query_id` +   A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`. -- `query_response_objects_ids` \ -   This is an array of the `object_id`. The size -- `user_id` \ + +- `query_response_objects_ids` + +   This is an array of the `object_id`'s. + +- `user_id` +   A user ID provided by the client -- `session_id` \ + +- `session_id` +   An optional session ID provided by the client diff --git a/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java b/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java index 3b451a4..8ca10a0 100644 --- a/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java +++ b/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java @@ -95,7 +95,7 @@ public List> getSettings() { settings.add(Setting.intSetting(SettingsConstants.VERSION_SETTING, 1, -1, Integer.MAX_VALUE, Setting.Property.IndexScope)); settings.add(Setting.simpleString(SettingsConstants.INDEX, "", Setting.Property.IndexScope)); - settings.add(Setting.simpleString(SettingsConstants.ID_FIELD, "", Setting.Property.IndexScope)); + settings.add(Setting.simpleString(SettingsConstants.object_id, "", Setting.Property.IndexScope)); return settings; diff --git a/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java b/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java index f6f367b..c79879f 100644 --- a/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java +++ b/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java @@ -103,9 +103,9 @@ public void onResponse(Response response) { if(!"".equals(storeName)) { final String index = getStoreSettings(storeName, SettingsConstants.INDEX); - final String idField = getStoreSettings(storeName, SettingsConstants.ID_FIELD); + final String idField = getStoreSettings(storeName, SettingsConstants.object_id); - LOGGER.debug("Using id_field [{}] of index [{}] for UBI query.", idField, index); + LOGGER.debug("Using object_id [{}] of index [{}] for UBI query.", idField, index); // Only consider this search if the index being searched matches the store's index setting. if (Arrays.asList(searchRequest.indices()).contains(index)) { @@ -124,7 +124,7 @@ public void onResponse(Response response) { if (idField == null || "".equals(idField) || idField.equals("null")) { - // Use the _id since there is no id_field setting for this index. + // Use the _id since there is no object_id setting for this index. queryResponseHitIds.add(String.valueOf(hit.docId())); } else { @@ -240,7 +240,7 @@ private void persistQuery(final String storeName, final QueryRequest queryReques source.put("query_id", queryRequest.getQueryId()); source.put("query", queryRequest.getQuery()); source.put("query_response_id", queryResponse.getQueryResponseId()); - source.put("query_response_hit_ids", queryResponse.getQueryResponseHitIds()); + source.put("query_response_objects_ids", queryResponse.getQueryResponseHitIds()); source.put("user_id", queryRequest.getUserId()); source.put("session_id", queryRequest.getSessionId()); diff --git a/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java b/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java index 200bb7a..6ad3d2e 100644 --- a/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java +++ b/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java @@ -160,7 +160,7 @@ private Map buildQueryRequestMap(final QueryRequest queryRequest source.put("query_id", queryRequest.getQueryId()); source.put("query", queryRequest.getQuery()); source.put("query_response_id", queryRequest.getQueryResponse().getQueryResponseId()); - source.put("query_response_hit_ids", queryRequest.getQueryResponse().getQueryResponseHitIds()); + source.put("query_response_objects_ids", queryRequest.getQueryResponse().getQueryResponseHitIds()); source.put("user_id", queryRequest.getUserId()); source.put("session_id", queryRequest.getSessionId()); diff --git a/src/main/java/com/o19s/ubi/model/SettingsConstants.java b/src/main/java/com/o19s/ubi/model/SettingsConstants.java index 469d830..8ae2b2b 100644 --- a/src/main/java/com/o19s/ubi/model/SettingsConstants.java +++ b/src/main/java/com/o19s/ubi/model/SettingsConstants.java @@ -26,6 +26,6 @@ public class SettingsConstants { /** * The field in an index's mapping that will be used as the unique identifier for a query result item. */ - public static final String ID_FIELD = "index.ubi.id_field"; + public static final String object_id = "index.ubi.object_id"; } diff --git a/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java b/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java index 90758c7..f94c539 100644 --- a/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java +++ b/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java @@ -84,7 +84,7 @@ protected RestChannelConsumer prepareRequest(RestRequest restRequest, NodeClient final String storeName = restRequest.param("store"); final String index = restRequest.param("index"); - final String idField = restRequest.param("id_field"); + final String idField = restRequest.param("object_id"); LOGGER.info("Received PUT for store {}", storeName); @@ -191,7 +191,7 @@ private RestChannelConsumer create(final NodeClient nodeClient, final String sto .put(IndexMetadata.INDEX_AUTO_EXPAND_REPLICAS_SETTING.getKey(), "0-2") .put(IndexMetadata.SETTING_PRIORITY, Integer.MAX_VALUE) .put(SettingsConstants.INDEX, index) - .put(SettingsConstants.ID_FIELD, idField) + .put(SettingsConstants.object_id, idField) .put(SettingsConstants.VERSION_SETTING, VERSION) .build(); diff --git a/src/main/resources/events-mapping.json b/src/main/resources/events-mapping.json index ac7e1e1..07a1caf 100644 --- a/src/main/resources/events-mapping.json +++ b/src/main/resources/events-mapping.json @@ -31,8 +31,8 @@ }, "object": { "properties": { - "catalog_id": { "type": "keyword" }, - "object_id": { "type": "keyword", "ignore_above": 256 }, + "internal_id": { "type": "keyword", "ignore_above": 256 }, + "object_id": { "type": "keyword" }, "object_type": { "type": "keyword", "ignore_above": 100 }, "transaction_id": { "type": "keyword", "ignore_above": 100 }, "name": { "type": "keyword", "ignore_above": 256 }, diff --git a/src/main/resources/queries-mapping.json b/src/main/resources/queries-mapping.json index dc5fc22..e69cdbb 100644 --- a/src/main/resources/queries-mapping.json +++ b/src/main/resources/queries-mapping.json @@ -9,7 +9,7 @@ "type": "text" }, "query_response_id": { "type": "keyword", "ignore_above": 100 }, - "query_response_hit_ids": { "type": "keyword" }, + "query_response_objects_ids": { "type": "keyword" }, "user_id": { "type": "keyword", "ignore_above": 100 }, "session_id": { "type": "keyword", "ignore_above": 100 } } diff --git a/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json b/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json index bfa909d..ea4514b 100644 --- a/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json +++ b/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json @@ -24,7 +24,7 @@ "type": "string", "description": "The name of the index being searched" }, - "id_field": { + "object_id": { "required": false, "type": "string", "description": "The name of the field to use for the doc ID field" diff --git a/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml b/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml index aff029f..9bd5b34 100644 --- a/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml +++ b/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml @@ -6,7 +6,7 @@ ubi.create_store: store: mystore index: ecommerce - id_field: name + object_id: name - do: cluster.health: @@ -60,14 +60,14 @@ index: "" --- -"Create a store without specifying an id_field": -# A missing id_field is allowed - the doc ID will be used instead. +"Create a store without specifying an object_id": +# A missing object_id is allowed - the doc ID will be used instead. - do: ubi.create_store: store: invalid_store index: some_index - id_field: "" + object_id: "" --- "Delete a store that does not exist": @@ -86,7 +86,7 @@ ubi.create_store: store: mystore index: ecommerce - id_field: name + object_id: name - do: cluster.health: @@ -97,6 +97,6 @@ ubi.create_store: store: mystore index: ecommerce - id_field: name + object_id: name - match: { status: initialized } From 8490804c6ae3b653270d51c59b1670c2f7ae2a0e Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sat, 20 Apr 2024 12:00:10 -0700 Subject: [PATCH 05/12] aesthetics --- documentation/schemas.md | 49 ++++++++++++++++++++++++++++------------ 1 file changed, 34 insertions(+), 15 deletions(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index 12aa1ad..9cebeb0 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -8,7 +8,27 @@ - **Ubi Logging Client**: is in charge of indexing user events, such as onClick, in the **event store** along with the `query_id` that links to the underlying, technical query DSL and the results' `object_id`'s. *Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. - + +```mermaid +graph TB +style L fill:none +subgraph L["`*Legend*`"] + subgraph ss[Standard Search] + direction LR + style ln1a fill:blue + ln1a[ ]--->ln1b[ ]; + end + subgraph Ubi flow + direction LR + ln2a[ ].->|new|ln2b[ ]; + style ln1c fill:red + ln1c[ ]-->|query_id|ln1d[ ]; + end +end +linkStyle 0 stroke-width:2px,stroke:#0A1CCF + +``` + ```mermaid %%{init: { "flowchart": {"htmlLabels": false}, @@ -17,14 +37,14 @@ }%% graph TB - style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5 +style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5 subgraph OS[OpenSearch Cluster fa:fa-database] E[( Ubi Events )] Docs[(Document Index)] --3) DSL & object_id's--> Q[( Ubi Queries )]; - Q -."4) query_id".-> Docs ; - - end - style *client-side* stroke-width:2px, stroke:#EC6363 + Q -."4) query_id".-> Docs ; +end + +style *client-side* stroke-width:2px, stroke:#EC6363 subgraph "`*client-side*`" style User stroke-width:4px, stroke:#EC636 User["`*User*`" fa:fa-user] @@ -32,30 +52,29 @@ subgraph "`*client-side*`" Search U style App fill:#EC6363,opacity:.5 - subgraph App[UserApp fa:fa-store] + subgraph App[       UserApp fa:fa-store] Search( Search Client ) U( Ubi Client ) end - User--1) raw search string-->Search; - + User--1) raw search string-->Search; end Search--2) search string-->Docs - -Docs -. 6) query_id & objects...->Search ; +Docs -- 6) query_id & objects--->Search ; Search --results--> User Search-.7) query_id.->U; - -User--8) selects - object_id:123-->U; +User -.8) selects object_id:123.->U; U-."9) index event:{query_id, onClick, object_id:123}".->E; -linkStyle 3,0,5 stroke-width:2px,fill:none,stroke:#0A1CCF +linkStyle 2,3,5,0 stroke-width:2px,fill:none,stroke:#0A1CCF linkStyle 1,4,6,8 stroke-width:2px,fill:none,stroke:red ``` +linkStyle 2,4,6,8,10 stroke-width:2px,fill:none,stroke:red + + Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need. [`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`. From f4d9a41bd87a81b18215b1c60ddd9f7fd41ec0a3 Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sat, 20 Apr 2024 12:03:59 -0700 Subject: [PATCH 06/12] tweakage --- documentation/schemas.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index 9cebeb0..01cbf35 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -10,7 +10,7 @@ *Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. ```mermaid -graph TB +graph LR style L fill:none subgraph L["`*Legend*`"] subgraph ss[Standard Search] @@ -18,14 +18,15 @@ subgraph L["`*Legend*`"] style ln1a fill:blue ln1a[ ]--->ln1b[ ]; end - subgraph Ubi flow + subgraph Ubi data flow direction LR - ln2a[ ].->|new|ln2b[ ]; + ln2a[ ].->|Ubi interaction|ln2b[ ]; style ln1c fill:red - ln1c[ ]-->|query_id|ln1d[ ]; + ln1c[ ]-->|query_id passing|ln1d[ ]; end end linkStyle 0 stroke-width:2px,stroke:#0A1CCF +linkStyle 2 stroke-width:2px,stroke:red ``` From 0c9430260d24218440d09cbad7f05939170e0544 Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sat, 20 Apr 2024 12:25:12 -0700 Subject: [PATCH 07/12] meh --- documentation/schemas.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index 01cbf35..ab1f3f2 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -13,13 +13,16 @@ graph LR style L fill:none subgraph L["`*Legend*`"] - subgraph ss[Standard Search] + style ss height:150px + subgraph ss["Standard Search"] direction LR + style ln1a fill:blue ln1a[ ]--->ln1b[ ]; end subgraph Ubi data flow direction LR + ln2a[ ].->|Ubi interaction|ln2b[ ]; style ln1c fill:red ln1c[ ]-->|query_id passing|ln1d[ ]; From 6f59baec648eefec9d0a3b1c3fa2954d68ae3575 Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sat, 20 Apr 2024 13:34:48 -0700 Subject: [PATCH 08/12] reorder numbers --- documentation/schemas.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index ab1f3f2..6e86e1f 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -11,6 +11,7 @@ ```mermaid graph LR + style L fill:none subgraph L["`*Legend*`"] style ss height:150px @@ -64,10 +65,10 @@ subgraph "`*client-side*`" end Search--2) search string-->Docs -Docs -- 6) query_id & objects--->Search ; -Search --results--> User -Search-.7) query_id.->U; +Docs -- 5) query_id & objects--->Search ; +Search-.6) query_id.->U; User -.8) selects object_id:123.->U; +Search --7) results--> User U-."9) index event:{query_id, onClick, object_id:123}".->E; linkStyle 2,3,5,0 stroke-width:2px,fill:none,stroke:#0A1CCF From 2ae05e887fd5df95c20a7eea94aa66f539f7a02d Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sun, 21 Apr 2024 08:46:09 -0700 Subject: [PATCH 09/12] latest --- documentation/schemas.md | 74 +++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 36 deletions(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index 6e86e1f..8452010 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -1,18 +1,22 @@ -# Key UBI concepts -## Ubi Roles -- **User Behavior Insights** module: once activated, is in charge of indexing a user's queries and results in the **query store** with a unique [`query_id`](#query_id), and passing that `query_id` back to the search client. - -- **Search Client**: in charge of searching and recieving the `query_id` from **User Behavior Insights**. This `query_id` is then passed to the **Ubi Logging Client** - -- **Ubi Logging Client**: is in charge of indexing user events, such as onClick, in the **event store** along with the `query_id` that links to the underlying, technical query DSL and the results' `object_id`'s. +# Key User Behavior Insights concepts +**User Behavior Insights** (Ubi) **Logging** is really just a matter of linking and indexing queries, results and events within OpenSearch. -*Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. +## Ubi Roles +- **Search Client**: in charge of searching, and then recieving *objects* from some document index in OpenSearch. +  (1, 2, *5* & 7, below) +- **User Behavior Insights** module: once activated, manages the **Ubi Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client**. +  (3, 4 & *5*, below) +- The **Search Client**, if separate from the **Ubi Client**, forwards the [`query_id`](#query_id) to the **Ubi Client**. +   *Note:* We break out the roles of *search* and *Ubi event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. +  (6, below) +- The **Ubi Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights** +- If the **Ubi Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), `onClick` and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*. +  (8 & 9, below) ```mermaid graph LR - -style L fill:none +style L fill:none,stroke-dasharray: 5 5 subgraph L["`*Legend*`"] style ss height:150px subgraph ss["Standard Search"] @@ -21,19 +25,17 @@ subgraph L["`*Legend*`"] style ln1a fill:blue ln1a[ ]--->ln1b[ ]; end - subgraph Ubi data flow + subgraph ubi-leg["Ubi data flow"] direction LR - ln2a[ ].->|Ubi interaction|ln2b[ ]; + ln2a[ ].->|"`**Ubi interaction**`"|ln2b[ ]; style ln1c fill:red - ln1c[ ]-->|query_id passing|ln1d[ ]; + ln1c[ ]-->|query_id flow|ln1d[ ]; end end linkStyle 0 stroke-width:2px,stroke:#0A1CCF linkStyle 2 stroke-width:2px,stroke:red - ``` - ```mermaid %%{init: { "flowchart": {"htmlLabels": false}, @@ -42,44 +44,44 @@ linkStyle 2 stroke-width:2px,stroke:red }%% graph TB +User--1) raw search string-->Search; +Search--2) search string-->Docs style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5 subgraph OS[OpenSearch Cluster fa:fa-database] - E[( Ubi Events )] - Docs[(Document Index)] --3) DSL & object_id's--> Q[( Ubi Queries )]; - Q -."4) query_id".-> Docs ; + style E stroke-width:1px,stroke:red + E[( Ubi Events )] + style Docs stroke-width:1px,stroke:#0A1CCF + style Q stroke-width:1px,stroke:red + Docs[(Document Index)] -."3) {DSL...} & [object_id's,...]".-> Q[( Ubi Queries )]; + Q -.4) query_id.-> Docs ; end +Docs -- "5) query_id & [objects,...]" --->Search ; +Search-.6) query_id.->U; +Search --7) [results, ...]--> User + style *client-side* stroke-width:2px, stroke:#EC6363 subgraph "`*client-side*`" style User stroke-width:4px, stroke:#EC636 - User["`*User*`" fa:fa-user] + User["`**User**`" fa:fa-user] App - Search + Search U style App fill:#EC6363,opacity:.5 subgraph App[       UserApp fa:fa-store] - Search( Search Client ) - U( Ubi Client ) + style Search stroke-width:2px, stroke:#0A1CCF + Search( Search Client ) + U( Ubi Client ) end - User--1) raw search string-->Search; end -Search--2) search string-->Docs -Docs -- 5) query_id & objects--->Search ; -Search-.6) query_id.->U; -User -.8) selects object_id:123.->U; -Search --7) results--> User -U-."9) index event:{query_id, onClick, object_id:123}".->E; +User -.8) selects object_id:123.->U; +U-."9) index event:{query_id, onClick, object_id:123}".->E; -linkStyle 2,3,5,0 stroke-width:2px,fill:none,stroke:#0A1CCF -linkStyle 1,4,6,8 stroke-width:2px,fill:none,stroke:red +linkStyle 1,2,0,6 stroke-width:2px,fill:none,stroke:#0A1CCF +linkStyle 3,4,5,8 stroke-width:2px,fill:none,stroke:red ``` - - -linkStyle 2,4,6,8,10 stroke-width:2px,fill:none,stroke:red - - Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need. [`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`. From 101a4bae7d5afb9963ff9db51056f39b3f09d03f Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sun, 21 Apr 2024 11:03:27 -0700 Subject: [PATCH 10/12] finalizing draft and linking to the main documentation screen. --- documentation/documentation.md | 48 +------- documentation/schemas.md | 216 ++++++++++++++++----------------- 2 files changed, 112 insertions(+), 152 deletions(-) diff --git a/documentation/documentation.md b/documentation/documentation.md index 49b6a16..ad74672 100644 --- a/documentation/documentation.md +++ b/documentation/documentation.md @@ -71,50 +71,10 @@ The plugin has a concept of a "store", which is a logical collection of the even index is used to store events, and the other index is for storing queries. ### OpenSearch Data Mappings - -#### Schema for events: - -The current event mappings file can be found [here](https://github.com/o19s/opensearch-ubi/blob/main/src/main/resources/events-mapping.json). - -**Primary fields include:** -- `action_name` - (size 100) - any name you want to call your event -- `timestamp` - unix epoch time. if not set, will be set by the plugin when the event is received -- `user_id`. `session_id`, `page_id` - (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and pages -- `query_id` - (size 100) - ID for some query. Note that it could be a unique search string, or it could represent a cluster of related searches (i.e.: *dress*, *red dress*, *long dress* could all have the same `query_id`). Either the client could control these, or the `query_id` could be retrieved from the API's response headers as it keeps track of queries on the node -- `message_type` - (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`. Use to group `action_name` together. -- `message` - (size 256) - optional text for the log entry - -**Other fields & data objects** -- `event_attributes.object` - contains an associated JSONified data object (i.e. books, products, user info, etc) if there are any - - `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object - - `event_attributes.object.key_value` - points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.). - **This field value should match the value in for the object's value in the `object_id` [below](#object_id) from the search store** - It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users. - - `event_attributes.object.object_type` - indicates the type/class of object - - `event_attributes.object.description` - optional description of the object - - `event_attributes.object.transaction_id` - optionally points to a unique id representing a successful transaction - - `event_attributes.object.to_user_id` - optionally points to another user, if they are the recipient of this object - - `event_attributes.object.object_detail` - optional data object/map of further data details -- `event_attributes.position` - nested object to track user events to the location of the event origins - - `event_attributes.position.ordinal` - tracks the nth item within a list that a user could select, click - - `event_attributes.position.{x,y}` - tracks x and y values, that the client defines - - `event_attributes.position.page_depth` - tracks page depth - - `event_attributes.position.scroll_depth` - tracks scroll depth - - `event_attributes.position.trail` - text field for tracking the path/trail that a user took to get to this location - -* Other mapped fields in the schema are intended to be optional placeholders for common attributes like `user_name`, `email`, `price` - -**the users can dynamically add any further fields to the event mapping - -#### Schema for queries: - -The current query mappings file can be found [here](https://github.com/o19s/opensearch-ubi/blob/main/src/main/resources/queries-mapping.json). - -- `timestamp` - A unix timestamp of when the query was received -- `query_id` - A unique ID of the query provided by the client or generated automatically by the plugin -- `query_response_id` - A unique ID for the collection of results for the query -- `user_id` - A user ID provided by the client -- `session_id` - An optional session ID provided by the client +Ubi has 2 primary indices: +- **UBi Queries** stores all queries and results. +- **UBi Events** store that the Ubi client writes events to. +*Please follow the [schema deep dive](./schemas.md) to understand how these two indices make Ubi into a causal framework for search.* ## Plugin API diff --git a/documentation/schemas.md b/documentation/schemas.md index 8452010..56c8481 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -1,19 +1,31 @@ # Key User Behavior Insights concepts **User Behavior Insights** (Ubi) **Logging** is really just a matter of linking and indexing queries, results and events within OpenSearch. +## Key ID's +Ubi is not functional unless the links between the following are consistently maintained within your Ubi-enabled application: + +- [`user_id`](#user_id) represents a unique user. +- [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as *epc*, *isbn*, *ssn*, *handle*, etc. +- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned. +- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action was taken (or not) with this `object_id` + +To summarize: the `query_id` signals the beginning of a `user_id`'s *Search Journey*, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with. ## Ubi Roles - **Search Client**: in charge of searching, and then recieving *objects* from some document index in OpenSearch.  (1, 2, *5* & 7, below) -- **User Behavior Insights** module: once activated, manages the **Ubi Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client**. +- **User Behavior Insights** module: once activated, manages the **Ubi Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client** so that events can be linked to this query.  (3, 4 & *5*, below) -- The **Search Client**, if separate from the **Ubi Client**, forwards the [`query_id`](#query_id) to the **Ubi Client**. +- **objects**: are whatever items the user is searching for with the queries. Activating Ubi involves mapping your real-world objects (via its *isbn*, etc...) to the [`object_id`](#object_id) fields in the schemas below. +- The **Search Client**, if separate from the **Ubi Client**, forwards the indexed [`query_id`](#query_id) to the **Ubi Client**.   *Note:* We break out the roles of *search* and *Ubi event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.  (6, below) -- The **Ubi Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights** -- If the **Ubi Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), `onClick` and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*. +- The **Ubi Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights** and passed back to the **Ubi Client** +- If the **Ubi Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), *onClick* [`action_name`](#action_name) and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*.  (8 & 9, below) + + ```mermaid graph LR style L fill:none,stroke-dasharray: 5 5 @@ -56,21 +68,22 @@ subgraph OS[OpenSearch Cluster fa:fa-database] Q -.4) query_id.-> Docs ; end -Docs -- "5) query_id & [objects,...]" --->Search ; +Docs -- "5) return both query_id & [objects,...]" --->Search ; Search-.6) query_id.->U; Search --7) [results, ...]--> User -style *client-side* stroke-width:2px, stroke:#EC6363 +style *client-side* stroke-width:1px, stroke:#D35400 subgraph "`*client-side*`" style User stroke-width:4px, stroke:#EC636 User["`**User**`" fa:fa-user] App Search U - style App fill:#EC6363,opacity:.5 + style App fill:#D35400,opacity:.35, stroke:#0A1CCF, stroke-width:2px subgraph App[       UserApp fa:fa-store] style Search stroke-width:2px, stroke:#0A1CCF Search( Search Client ) + style U stroke-width:1px,stroke:red U( Ubi Client ) end end @@ -82,50 +95,54 @@ linkStyle 1,2,0,6 stroke-width:2px,fill:none,stroke:#0A1CCF linkStyle 3,4,5,8 stroke-width:2px,fill:none,stroke:red ``` -Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need. +## Ubi Stores +There are 2 separate stores for Ubi: +### 1) **Ubi Queries** +All underlying query information and results ([`object_id`](#object_id)'s) are stored in the **Ubi Queries** store, and remains largely invisible in the background. +The only obvious difference will be in the `ubi` stanze of the json response, *which could cause index bloat if one forgets that this is enabled*. -[`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`. +**Ubi Queries** [schema](../src/main/resources/queries-mapping.json): +Since Ubi manages the **Ubi Queries** store, the developer should never have to write directly to this store (except for importing data). -The purpose of the [`query_id`](#query_id)'s help link the user's raw query string to the results, as well as any subsequent action that the UBI client logs. -When UBI is turned on, a *search client* will get a `query_id` back from OpenSearch, and is passed to the UBI client. The UBI client then associates each subsequent event with this query until it receives a new query_id. - -[`action_name`](#action_name) says what the name of the event is. It can be any name, such as *login*, *logout*, *save*, *post*, *add_to_cart*... - - [`event_attributes`](#event_attributes)'s is where any relevant information about the event can be stored. - The two primary, predefined objects in the attributes are [`event_attributes.position`](#position), which contains - information on what part of the application the user is interacting with, - and [`event_attributes.object`](#object), which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..). - -The `object` structure has two ways to refer to the object: -- `event_attributes.object.object_id` is the unique id that OpenSearch uses internally to index the object, think the `_id` field in the indices. -- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog* +- `timestamp` +   A unix timestamp of when the query was received - Therefore, the `query_id` signals the beginning of a user's *Search Journey*, -`action_name` tells us how the user is interacting with the query results within the application, -and `event_attributes.object` is referring to the precise query result that the user interacts with. +- `query_id` +   A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`. -### OpenSearch Data Mappings +- `query_response_objects_ids` +   This is an array of the `object_id`'s. -#### Schema for events: +- `user_id` +   A user ID provided by the client -The current event mappings file can be found [here](../src/main/resources/events-mapping.json). +- `session_id` +   An optional session ID provided by the client -**Primary fields include:** +### 2) **Ubi Events** +This is the event store that the client side directly indexes events to, linking the event [`action_name`](#action_name), [`object_id`](#object_id)'s and [`query_id`](#query_id)'s together with any other important event information. +Since this schema is dynamic, the developer can add any new fields and structures (such as *user* information, *geo-location* information, etc.) at index time that are not in the current **Ubi Events** [schema](../src/main/resources/events-mapping.json): - `application`

-   (size 100) - name of application tracking UBI events + +   (size 100) - name of the application tracking UBI events (e.g. *amazon-shop*, *ABC-microservice*) - `action_name`

-   (size 100) - any name you want to call your event -- `timestamp`: + +   (size 100) - any name you want to call your event. For example, with *javascript* events, you could include `on_click`, `logon`, `add_to_cart`, `page_scroll`.... -   Unix epoch time. If not set , will be set by the plugin when the event is received - `query_id`

-   (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated by the server. + +   (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by **Ubi Queries**. - `user_id`. `session_id`, `source_id`

+   (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event. - The `user_id` must be consistent in both the `query` and `event` stores. + The `user_id` must be consistent in both the **Ubi Queries** and **Ubi Events** stores. + +- `timestamp`: +   UTC-based, unix epoch time. + - `message_type`   (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`. @@ -133,89 +150,72 @@ The current event mappings file can be found [here](../src/main/resources/events - `message` -   (size 256) - optional text for the log entry +   (size 256) - optional text message for the log entry. For example, with a `message_type` of `INFO`, people might expect an informational or debug type text for this field, but a `message_type` of `QUERY`, we would expect the text to be more about what the user is searching on. -**Other attribute fields & data objects**

-- `event_attributes.object` - -   represents the search result object (i.e. books, products, user info, etc) if there are any - - `event_attributes.object.internal_id` - points to a unique, internal, id representing and instance of that object +- `event_attributes`'s structure is where any relevant information about the event can be stored. + There are two primary structures in the `event_attributes`: + - **`event_attributes.position`** - structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n*th object out of 10 results, .... - - `event_attributes.object.object_id` -

-   points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.). - **This field value should match the value in for the object's value in the `Object_id` [below](#object_id) from the search store** - It is possible that the `object_id` and `internal_id` match if the same id is used both internally for indexing and externally for the users. - - - `event_attributes.object.object_type` - -   indicates the type/class of object + - `event_attributes.position.ordinal` + +   tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`}) + + - `event_attributes.position.{x,y}` + +   tracks x and y values, that the client defines + + - `event_attributes.position.page_depth` + +   tracks page depth of results + + - `event_attributes.position.scroll_depth` + +   tracks scroll depth of page results + + - `event_attributes.position.trail` + +   text field for tracking the path/trail that a user took to get to this location + +

+ + - **`event_attributes.object`**, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..). + The `object` structure has two ways to refer to the object, with `object_id` being the id that links prior queries to this object: + + - `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indices. + - `event_attributes.object.object_id` +   is the id that a user could look up amd find the object instance within the **document index**. Examples include: *ssn*, *isbn*, *primary_ean*, etc. + Initializing Ubi requires mapping from the **Document Index**'s primary key to this `object_id` + + - `event_attributes.object.object_type` + +   indicates the type/class of object - - `event_attributes.object.description` - -   optional description of the object - - - `event_attributes.object.transaction_id` - -   optionally points to a unique id representing a successful transaction - - - `event_attributes.object.to_user_id` - -   optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id` - - `event_attributes.object.object_detail` - -   optional text for further data object details - - - `event_attributes.object.object_detail.json` + - `event_attributes.object.description` + +   optional description of the object + + - `event_attributes.object.transaction_id` + +   optionally points to a unique id representing a successful transaction + + - `event_attributes.object.to_user_id` + +   optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id` + - `event_attributes.object.object_detail` + +   optional text for further data object details + + - `event_attributes.object.object_detail.json` + +   if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large. +- *dynamic fields*: any new fields by any other names in the json objects that one indexes will dynamically expand this schema to that use-case. -   if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large. -- `event_attributes.position` - -   nested object to track user events to the location of the event origins - - `event_attributes.position.ordinal` - -   tracks the nth item within a list that a user could select, click - - `event_attributes.position.{x,y}` - -   tracks x and y values, that the client defines - - `event_attributes.position.page_depth` - -   tracks page depth - - `event_attributes.position.scroll_depth` - -   tracks scroll depth - - `event_attributes.position.trail` - -   text field for tracking the path/trail that a user took to get to this location -* Note the developers can add optional, dynamic fields like `user_name`, `email`, `price` per individual use-cases. -#### Schema for queries: -The current query mappings file can be found [here](../src/main/resources/queries-mapping.json). - -- `timestamp` - -   A unix timestamp of when the query was received - -- `query_id` - -   A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`. - -- `query_response_objects_ids` - -   This is an array of the `object_id`'s. - -- `user_id` - -   A user ID provided by the client - -- `session_id` - -   An optional session ID provided by the client From 5f8ebad5b678b8401e55a66162a3efa0b3dc9cca Mon Sep 17 00:00:00 2001 From: RasonJ <145287540+RasonJ@users.noreply.github.com> Date: Sun, 21 Apr 2024 11:08:12 -0700 Subject: [PATCH 11/12] meh 2 --- documentation/schemas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index 56c8481..afdc84c 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -7,7 +7,7 @@ Ubi is not functional unless the links between the following are consistently ma - [`user_id`](#user_id) represents a unique user. - [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as *epc*, *isbn*, *ssn*, *handle*, etc. - [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned. -- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action was taken (or not) with this `object_id` +- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action (such as `click` or `add_to_cart`) was taken (or not) with this `object_id`. To summarize: the `query_id` signals the beginning of a `user_id`'s *Search Journey*, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with. From cb0297f0ab94913cd3f2b18796bd33f9058e8195 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Mon, 22 Apr 2024 20:08:55 -0400 Subject: [PATCH 12/12] doc changes from review --- documentation/schemas.md | 33 +++++++++++++-------------------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/documentation/schemas.md b/documentation/schemas.md index afdc84c..234e013 100644 --- a/documentation/schemas.md +++ b/documentation/schemas.md @@ -6,7 +6,7 @@ Ubi is not functional unless the links between the following are consistently ma - [`user_id`](#user_id) represents a unique user. - [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as *epc*, *isbn*, *ssn*, *handle*, etc. -- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned. +- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned. \ - [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action (such as `click` or `add_to_cart`) was taken (or not) with this `object_id`. To summarize: the `query_id` signals the beginning of a `user_id`'s *Search Journey*, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with. @@ -109,15 +109,17 @@ Since Ubi manages the **Ubi Queries** store, the developer should never have to - `query_id`   A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`. + + - `user_id` +   A user ID provided by the client + +- `session_id` +   An optional session ID provided by the client. _This is currently under review of if we keep this_. - `query_response_objects_ids` -   This is an array of the `object_id`'s. +   This is an array of the `object_id`'s. This *could* be the same id as the `_id` but is meant to be the externally valid id of document/item/product. -- `user_id` -   A user ID provided by the client -- `session_id` -   An optional session ID provided by the client ### 2) **Ubi Events** This is the event store that the client side directly indexes events to, linking the event [`action_name`](#action_name), [`object_id`](#object_id)'s and [`query_id`](#query_id)'s together with any other important event information. @@ -129,7 +131,7 @@ Since this schema is dynamic, the developer can add any new fields and structure - `action_name`

-   (size 100) - any name you want to call your event. For example, with *javascript* events, you could include `on_click`, `logon`, `add_to_cart`, `page_scroll`.... +   (size 100) - any name you want to call your event. For example, with *javascript* events, you could include `on_click`, `logon`, `add_to_cart`, `page_scroll`.... _This should be formalized. A list of standard ones and then custom ones._ - `query_id`

@@ -146,7 +148,7 @@ Since this schema is dynamic, the developer can add any new fields and structure - `message_type`   (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`. - Can be used to group `action_name` together in logical bins. + Can be used to group `action_name` together in logical bins. _Thinking this should be backend logic in analysis_ - `message` @@ -184,12 +186,12 @@ Since this schema is dynamic, the developer can add any new fields and structure - `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indices. - `event_attributes.object.object_id` -   is the id that a user could look up amd find the object instance within the **document index**. Examples include: *ssn*, *isbn*, *primary_ean*, etc. +   is the id that a user could look up amd find the object instance within the **document corpus**. Examples include: *ssn*, *isbn*, *primary_ean*, etc. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`. Initializing Ubi requires mapping from the **Document Index**'s primary key to this `object_id` - `event_attributes.object.object_type` -   indicates the type/class of object +   indicates the type/class of object. - `event_attributes.object.description` @@ -209,13 +211,4 @@ Since this schema is dynamic, the developer can add any new fields and structure - `event_attributes.object.object_detail.json`   if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large. -- *dynamic fields*: any new fields by any other names in the json objects that one indexes will dynamically expand this schema to that use-case. - - - - - - - - - +- *extensible fields*: any new fields by any other names in the json objects that one indexes will dynamically expand this schema to that use-case.