From 8c00d4e31766214ef2a93a9905f94874bb8ecae0 Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Fri, 19 Apr 2024 07:05:43 -0700
Subject: [PATCH 01/12] Schema & documentation working updates
---
documentation/schemas.md | 99 +++++++++++++++++
src/main/resources/events-mapping.json | 145 ++++++-------------------
2 files changed, 131 insertions(+), 113 deletions(-)
create mode 100644 documentation/schemas.md
diff --git a/documentation/schemas.md b/documentation/schemas.md
new file mode 100644
index 0000000..cfcb10c
--- /dev/null
+++ b/documentation/schemas.md
@@ -0,0 +1,99 @@
+
+# Key UBI concepts
+
+Although the named fields below follow a schema lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.
+
+[`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`.
+
+The purpose of the [`query_id`](#query_id)'s help link the user's raw query string to the results, as well as any subsequent action that the UBI client logs.
+When UBI is turned on, a *search client* will get a `query_id` back from OpenSearch, and is passed to the UBI client. The UBI client then associates each subsequent event with this query until it receives a new query_id.
+
+[`action_name`](#action_name) says what the name of the event is. It can be any name, such as *login*, *logout*, *save*, *post*, *add_to_cart*...
+
+ [`event_attributes`](#event_attributes)'s is where any relevant information about the event can be stored.
+ The two primary, predefined objects in the attributes are [`event_attributes.position`](#position), which contains
+ information on what part of the application the user is interacting with,
+ and [`event_attributes.object`](#object), which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..).
+
+# TODO: `key_field` rename?
+The `object` structure has two ways to refer to the object:
+- `event_attributes.object.object_id` is the unique id that OpenSearch can use internally to index the object.
+- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catelog*
+
+
+ Therefore, the `query_id` signals the beginning of a user's *Search Journey*,
+`action_name` tells us how the user is interacting with the query results within the application,
+and `event_attributes.object` is referring to the precise query result that the user interacts with.
+
+### OpenSearch Data Mappings
+
+#### Schema for events:
+
+The current event mappings file can be found [here](../src/main/resources/events-mapping.json).
+
+**Primary fields include:**
+- `application`
+ (size 100) - name of application tracking UBI events
+- `action_name`
+ (size 100) - any name you want to call your event
+- `timestamp`: \
+ Unix epoch time. If not set , will be set by the plugin when the event is received
+- `query_id`
+ (size 100) - ID for some query. Note that it could be a unique search string, or it could represent a cluster of related searches (i.e.: *dress*, *red dress*, *long dress* could all have the same `query_id`). Either the client could control these, or the `query_id` could be retrieved from the API's response headers as it keeps track of queries on the node
+- `user_id`. `session_id`, `source_id`
+ (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event.
+ The `user_id` must be consistent in both the `query` and `event` stores.
+- `message_type` \
+ (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`.
+ Can be used to group `action_name` together in logical bins.
+
+- `message` \
+ (size 256) - optional text for the log entry
+
+**Other attribute fields & data objects**
+- `event_attributes.object` \
+ represents the search result object (i.e. books, products, user info, etc) if there are any
+ - `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object
+ - `event_attributes.object.catalog_id` \
+ points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.).
+ **This field value should match the value in for the object's value in the `catalog_field` [below](#catalog_field) from the search store**
+ It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users.
+ - `event_attributes.object.object_type` \
+ indicates the type/class of object
+ - `event_attributes.object.description` \
+ optional description of the object
+ - `event_attributes.object.transaction_id` \
+ optionally points to a unique id representing a successful transaction
+ - `event_attributes.object.to_user_id` \
+ optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id`
+ - `event_attributes.object.object_detail` \
+ optional data object/map of further data details
+- `event_attributes.position` \
+ nested object to track user events to the location of the event origins
+ - `event_attributes.position.ordinal` \
+ tracks the nth item within a list that a user could select, click
+ - `event_attributes.position.{x,y}` \
+ tracks x and y values, that the client defines
+ - `event_attributes.position.page_depth` \
+ tracks page depth
+ - `event_attributes.position.scroll_depth` \
+ tracks scroll depth
+ - `event_attributes.position.trail` \
+ text field for tracking the path/trail that a user took to get to this location
+
+* Note the developers can add optional, dynamic fields like `user_name`, `email`, `price` per individual use-cases.
+
+#### Schema for queries:
+
+The current query mappings file can be found [here](../src/main/resources/queries-mapping.json).
+
+- `timestamp` \
+ A unix timestamp of when the query was received
+- `query_id` \
+ A unique ID of the query provided by the client or generated automatically
+- `query_response_id` \
+ A unique ID for the collection of results for the query
+- `user_id` \
+ A user ID provided by the client
+- `session_id` \
+ An optional session ID provided by the client
diff --git a/src/main/resources/events-mapping.json b/src/main/resources/events-mapping.json
index fd6d60c..ac7e1e1 100644
--- a/src/main/resources/events-mapping.json
+++ b/src/main/resources/events-mapping.json
@@ -1,130 +1,49 @@
{
"properties": {
- "action_name": {
- "type": "keyword",
- "ignore_above": 100
- },
- "user_id": {
- "type": "keyword",
- "ignore_above": 100
- },
- "session_id": {
- "type": "keyword",
- "ignore_above": 100
- },
- "query_id": {
- "type": "keyword",
- "ignore_above": 100
- },
- "page_id": {
- "type": "keyword",
- "ignore_above": 256
- },
- "message": {
- "type": "keyword",
- "ignore_above": 256
- },
- "message_type": {
- "type": "keyword",
- "ignore_above": 100
- },
- "timestamp": {
- "type": "date",
- "doc_values": true
- },
+ "application": { "type": "keyword", "ignore_above": 100 },
+ "action_name": { "type": "keyword", "ignore_above": 100 },
+ "query_id": { "type": "keyword", "ignore_above": 100 },
+ "user_id": { "type": "keyword", "ignore_above": 100 },
+ "session_id": { "type": "keyword", "ignore_above": 100 },
+ "source_id": { "type": "keyword", "ignore_above": 256 },
+ "message": { "type": "keyword", "ignore_above": 256 },
+ "message_type": { "type": "keyword", "ignore_above": 100 },
+ "timestamp": { "type": "date", "doc_values": true },
"event_attributes": {
"properties": {
- "user_name": {
- "type": "keyword",
- "ignore_above": 256
- },
- "user_id": {
- "type": "keyword",
- "ignore_above": 100
- },
- "email": {
- "type": "keyword"
- },
- "price": {
- "type": "float"
- },
- "ip": {
- "type": "ip",
- "ignore_malformed": true
- },
- "browser": {
- "type": "text",
- "fields": {
- "keyword": {
- "type": "keyword",
- "ignore_above": 256
- }
- }
+ "ip": { "type": "ip", "ignore_malformed": true },
+ "browser": { "type": "text",
+ "fields":
+ { "keyword": { "type": "keyword", "ignore_above": 256 } }
},
"position": {
"properties": {
- "ordinal": {
- "type": "integer"
- },
- "x": {
- "type": "integer"
- },
- "y": {
- "type": "integer"
- },
- "page_depth": {
- "type": "integer"
- },
- "scroll_depth": {
- "type": "integer"
- },
- "trail": {
- "type": "text",
- "fields": {
- "keyword": {
- "type": "keyword",
- "ignore_above": 256
- }
- }
+ "ordinal": { "type": "integer" },
+ "x": { "type": "integer" },
+ "y": { "type": "integer" },
+ "page_depth": { "type": "integer" },
+ "scroll_depth": { "type": "integer" },
+ "trail": { "type": "text",
+ "fields":
+ { "keyword": { "type": "keyword", "ignore_above": 256 } }
}
}
},
"object": {
"properties": {
- "key_value": {
- "type": "keyword"
- },
- "object_id": {
- "type": "keyword",
- "ignore_above": 256
- },
- "object_type": {
- "type": "keyword",
- "ignore_above": 100
- },
- "transaction_id": {
- "type": "keyword",
- "ignore_above": 100
- },
- "name": {
- "type": "keyword",
- "ignore_above": 256
- },
- "description": {
- "type": "text",
+ "catalog_id": { "type": "keyword" },
+ "object_id": { "type": "keyword", "ignore_above": 256 },
+ "object_type": { "type": "keyword", "ignore_above": 100 },
+ "transaction_id": { "type": "keyword", "ignore_above": 100 },
+ "name": { "type": "keyword", "ignore_above": 256 },
+ "description": { "type": "text",
"fields": {
- "keyword": {
- "type": "keyword",
- "ignore_above": 256
- }
- }
- },
- "to_user_id": {
- "type": "keyword",
- "ignore_above": 100
+ "keyword": { "type": "keyword", "ignore_above": 256 } }
},
- "object_detail": {
- "type": "object"
+ "to_user_id": { "type": "keyword", "ignore_above": 100 },
+ "object_details": { "type": "text",
+ "fields":
+ { "json": { "type": "object"} }
}
}
}
From 48fa7198c092d2c827c280d6f103b57bf00678cf Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Fri, 19 Apr 2024 07:13:45 -0700
Subject: [PATCH 02/12] forgot to mention the new object.object_details.json
field.
---
documentation/schemas.md | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index cfcb10c..2c506e2 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -18,8 +18,7 @@ When UBI is turned on, a *search client* will get a `query_id` back from OpenSea
# TODO: `key_field` rename?
The `object` structure has two ways to refer to the object:
- `event_attributes.object.object_id` is the unique id that OpenSearch can use internally to index the object.
-- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catelog*
-
+- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog*
Therefore, the `query_id` signals the beginning of a user's *Search Journey*,
`action_name` tells us how the user is interacting with the query results within the application,
@@ -67,7 +66,9 @@ The current event mappings file can be found [here](../src/main/resources/events
- `event_attributes.object.to_user_id` \
optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id`
- `event_attributes.object.object_detail` \
- optional data object/map of further data details
+ optional text for further data object details
+ - `event_attributes.object.object_detail.json` \
+ if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large.
- `event_attributes.position` \
nested object to track user events to the location of the event origins
- `event_attributes.position.ordinal` \
From 168649b068daa6b2c6924b9ecf8176920b745dea Mon Sep 17 00:00:00 2001
From: Eric Pugh
Date: Fri, 19 Apr 2024 16:42:41 -0400
Subject: [PATCH 03/12] track some changes based on team discussions
---
documentation/schemas.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 2c506e2..89fe88f 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -1,7 +1,7 @@
# Key UBI concepts
-Although the named fields below follow a schema lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.
+Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.
[`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`.
@@ -17,7 +17,7 @@ When UBI is turned on, a *search client* will get a `query_id` back from OpenSea
# TODO: `key_field` rename?
The `object` structure has two ways to refer to the object:
-- `event_attributes.object.object_id` is the unique id that OpenSearch can use internally to index the object.
+- `event_attributes.object.object_id` is the unique id that OpenSearch uses internally to index the object, think the `_id` field in the indices.
- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog*
Therefore, the `query_id` signals the beginning of a user's *Search Journey*,
@@ -38,7 +38,7 @@ The current event mappings file can be found [here](../src/main/resources/events
- `timestamp`: \
Unix epoch time. If not set , will be set by the plugin when the event is received
- `query_id`
- (size 100) - ID for some query. Note that it could be a unique search string, or it could represent a cluster of related searches (i.e.: *dress*, *red dress*, *long dress* could all have the same `query_id`). Either the client could control these, or the `query_id` could be retrieved from the API's response headers as it keeps track of queries on the node
+ (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated by the server.
- `user_id`. `session_id`, `source_id`
(size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event.
The `user_id` must be consistent in both the `query` and `event` stores.
@@ -91,9 +91,9 @@ The current query mappings file can be found [here](../src/main/resources/querie
- `timestamp` \
A unix timestamp of when the query was received
- `query_id` \
- A unique ID of the query provided by the client or generated automatically
-- `query_response_id` \
- A unique ID for the collection of results for the query
+ A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.
+- `query_response_objects_ids` \
+ This is an array of the `object_id`. The size
- `user_id` \
A user ID provided by the client
- `session_id` \
From 5f61fcdc3c2300b9973ae93771e63e434619ed39 Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sat, 20 Apr 2024 11:14:19 -0700
Subject: [PATCH 04/12] Updating new schema tweaks + a new diagram
---
documentation/documentation.md | 6 +-
documentation/queries/sql_queries.md | 6 +-
documentation/schemas.md | 155 ++++++++++++++----
.../o19s/ubi/UserBehaviorInsightsPlugin.java | 2 +-
.../UserBehaviorInsightsActionFilter.java | 8 +-
.../o19s/ubi/data/OpenSearchDataManager.java | 2 +-
.../com/o19s/ubi/model/SettingsConstants.java | 2 +-
.../rest/UserBehaviorInsightsRestHandler.java | 4 +-
src/main/resources/events-mapping.json | 4 +-
src/main/resources/queries-mapping.json | 2 +-
.../rest-api-spec/api/ubi.create_store.json | 2 +-
.../test/_plugins.ubi/20_manage_stores.yml | 12 +-
12 files changed, 150 insertions(+), 55 deletions(-)
diff --git a/documentation/documentation.md b/documentation/documentation.md
index 2bb18ce..49b6a16 100644
--- a/documentation/documentation.md
+++ b/documentation/documentation.md
@@ -27,7 +27,7 @@ docker compose -f docker-compose-cluster.yaml up
Initialize the `awesome` UBI store:
```
-curl -X PUT "http://localhost:9200/_plugins/ubi/awesome?index=ecommerce&id_field=id"
+curl -X PUT "http://localhost:9200/_plugins/ubi/awesome?index=ecommerce&object_id=id"
```
Send an event to the `awesome` store:
@@ -88,7 +88,7 @@ The current event mappings file can be found [here](https://github.com/o19s/open
- `event_attributes.object` - contains an associated JSONified data object (i.e. books, products, user info, etc) if there are any
- `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object
- `event_attributes.object.key_value` - points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.).
- **This field value should match the value in for the object's value in the `id_field` [below](#id_field) from the search store**
+ **This field value should match the value in for the object's value in the `object_id` [below](#object_id) from the search store**
It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users.
- `event_attributes.object.object_type` - indicates the type/class of object
- `event_attributes.object.description` - optional description of the object
@@ -122,7 +122,7 @@ The plugin exposes a REST API for managing UBI stores and persisting events.
| Method | Endpoint | Purpose |
|--------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `PUT` | `/_plugins/ubi/{store}?index={index}&id_field={id_field}` |
Initialize a new UBI store for the given index. The `id_field` is optional and allows for providing the name of a field in the `index`'s schema to be used as the unique result/item ID for each search result. If not provided, the `_id` field is used.
|
+| `PUT` | `/_plugins/ubi/{store}?index={index}&object_id={object_id}` | Initialize a new UBI store for the given index. The `object_id` is optional and allows for providing the name of a field in the `index`'s schema to be used as the unique result/item ID for each search result. If not provided, the `_id` field is used.
|
| `DELETE` | `/_plugins/ubi/{store}` | Delete a UBI store |
| `GET` | `/_plugins/ubi` | Get a list of all UBI stores |
| `POST` | `/_plugins/ubi/{store}` | Index an event into the UBI store |
diff --git a/documentation/queries/sql_queries.md b/documentation/queries/sql_queries.md
index a2dec00..68096c8 100644
--- a/documentation/queries/sql_queries.md
+++ b/documentation/queries/sql_queries.md
@@ -9,7 +9,7 @@ Although it's trivial on the server side to find queries with no results, we can
select
count(0)
from .ubi_log_queries
-where query_response_hit_ids is null
+where query_response_objects_ids is null
order by user_id
```
@@ -18,7 +18,7 @@ order by user_id
select
count(0)
from .ubi_log_events
-where action_name='on_search' and event_attributes.data.data_detail.query_data.query_response_hit_ids is null
+where action_name='on_search' and event_attributes.data.data_detail.query_data.query_response_objects_ids is null
order by timestamp
```
@@ -113,7 +113,7 @@ where query_id ='1065c70f-d46a-442f-8ce4-0b5e7a71a892'
order by timestamp
```
(In this generated data, the `query` field is plain text; however in the real implementation the query will be in the internal DSL of the query and parameters.)
-query_response_id|query_id|user_id|query|query_response_hit_ids|session_id|timestamp
+query_response_id|query_id|user_id|query|query_response_objects_ids|session_id|timestamp
---|---|---|---|---|---|---
1065c70f-d46a-442f-8ce4-0b5e7a71a892|1065c70f-d46a-442f-8ce4-0b5e7a71a892|155_7e3471ff-14c8-45cb-bc49-83a056c37192|Blanditiis quo sint repudiandae a sit.|8659955|fa6e3b1c-3212-44d2-b16b-690b4aeddbba_1975|2027-04-17 10:16:45
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 89fe88f..12aa1ad 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -1,5 +1,60 @@
# Key UBI concepts
+## Ubi Roles
+- **User Behavior Insights** module: once activated, is in charge of indexing a user's queries and results in the **query store** with a unique [`query_id`](#query_id), and passing that `query_id` back to the search client.
+
+- **Search Client**: in charge of searching and recieving the `query_id` from **User Behavior Insights**. This `query_id` is then passed to the **Ubi Logging Client**
+
+- **Ubi Logging Client**: is in charge of indexing user events, such as onClick, in the **event store** along with the `query_id` that links to the underlying, technical query DSL and the results' `object_id`'s.
+
+*Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
+
+```mermaid
+%%{init: {
+ "flowchart": {"htmlLabels": false},
+
+ }
+}%%
+graph TB
+
+ style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5
+subgraph OS[OpenSearch Cluster fa:fa-database]
+ E[( Ubi Events )]
+ Docs[(Document Index)] --3) DSL & object_id's--> Q[( Ubi Queries )];
+ Q -."4) query_id".-> Docs ;
+
+ end
+ style *client-side* stroke-width:2px, stroke:#EC6363
+subgraph "`*client-side*`"
+ style User stroke-width:4px, stroke:#EC636
+ User["`*User*`" fa:fa-user]
+ App
+ Search
+ U
+ style App fill:#EC6363,opacity:.5
+ subgraph App[UserApp fa:fa-store]
+ Search( Search Client )
+ U( Ubi Client )
+ end
+ User--1) raw search string-->Search;
+
+end
+
+Search--2) search string-->Docs
+
+Docs -. 6) query_id & objects...->Search ;
+Search --results--> User
+Search-.7) query_id.->U;
+
+User--8) selects
+ object_id:123-->U;
+U-."9) index event:{query_id, onClick, object_id:123}".->E;
+
+linkStyle 3,0,5 stroke-width:2px,fill:none,stroke:#0A1CCF
+linkStyle 1,4,6,8 stroke-width:2px,fill:none,stroke:red
+```
+
+
Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.
@@ -15,7 +70,6 @@ When UBI is turned on, a *search client* will get a `query_id` back from OpenSea
information on what part of the application the user is interacting with,
and [`event_attributes.object`](#object), which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..).
-# TODO: `key_field` rename?
The `object` structure has two ways to refer to the object:
- `event_attributes.object.object_id` is the unique id that OpenSearch uses internally to index the object, think the `_id` field in the indices.
- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog*
@@ -31,55 +85,87 @@ and `event_attributes.object` is referring to the precise query result that the
The current event mappings file can be found [here](../src/main/resources/events-mapping.json).
**Primary fields include:**
-- `application`
+- `application`
+
(size 100) - name of application tracking UBI events
-- `action_name`
+- `action_name`
+
(size 100) - any name you want to call your event
-- `timestamp`: \
+- `timestamp`:
+
Unix epoch time. If not set , will be set by the plugin when the event is received
-- `query_id`
+- `query_id`
+
(size 100) - ID for some query. Either the client provides this, or the `query_id` is generated by the server.
- `user_id`. `session_id`, `source_id`
(size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event.
The `user_id` must be consistent in both the `query` and `event` stores.
-- `message_type` \
+- `message_type`
+
(size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`.
Can be used to group `action_name` together in logical bins.
-- `message` \
+- `message`
+
(size 256) - optional text for the log entry
**Other attribute fields & data objects**
-- `event_attributes.object` \
+- `event_attributes.object`
+
represents the search result object (i.e. books, products, user info, etc) if there are any
- - `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object
- - `event_attributes.object.catalog_id` \
+
+ - `event_attributes.object.internal_id` - points to a unique, internal, id representing and instance of that object
+
+ - `event_attributes.object.object_id`
+
points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.).
- **This field value should match the value in for the object's value in the `catalog_field` [below](#catalog_field) from the search store**
- It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users.
- - `event_attributes.object.object_type` \
+ **This field value should match the value in for the object's value in the `Object_id` [below](#object_id) from the search store**
+ It is possible that the `object_id` and `internal_id` match if the same id is used both internally for indexing and externally for the users.
+
+ - `event_attributes.object.object_type`
+
indicates the type/class of object
- - `event_attributes.object.description` \
+
+ - `event_attributes.object.description`
+
optional description of the object
- - `event_attributes.object.transaction_id` \
+
+ - `event_attributes.object.transaction_id`
+
optionally points to a unique id representing a successful transaction
- - `event_attributes.object.to_user_id` \
+
+ - `event_attributes.object.to_user_id`
+
optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id`
- - `event_attributes.object.object_detail` \
+ - `event_attributes.object.object_detail`
+
optional text for further data object details
- - `event_attributes.object.object_detail.json` \
+
+ - `event_attributes.object.object_detail.json`
+
if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large.
-- `event_attributes.position` \
+
+- `event_attributes.position`
+
nested object to track user events to the location of the event origins
- - `event_attributes.position.ordinal` \
+ - `event_attributes.position.ordinal`
+
tracks the nth item within a list that a user could select, click
- - `event_attributes.position.{x,y}` \
+
+ - `event_attributes.position.{x,y}`
+
tracks x and y values, that the client defines
- - `event_attributes.position.page_depth` \
+
+ - `event_attributes.position.page_depth`
+
tracks page depth
- - `event_attributes.position.scroll_depth` \
+
+ - `event_attributes.position.scroll_depth`
+
tracks scroll depth
- - `event_attributes.position.trail` \
+
+ - `event_attributes.position.trail`
+
text field for tracking the path/trail that a user took to get to this location
* Note the developers can add optional, dynamic fields like `user_name`, `email`, `price` per individual use-cases.
@@ -88,13 +174,22 @@ The current event mappings file can be found [here](../src/main/resources/events
The current query mappings file can be found [here](../src/main/resources/queries-mapping.json).
-- `timestamp` \
+- `timestamp`
+
A unix timestamp of when the query was received
-- `query_id` \
+
+- `query_id`
+
A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.
-- `query_response_objects_ids` \
- This is an array of the `object_id`. The size
-- `user_id` \
+
+- `query_response_objects_ids`
+
+ This is an array of the `object_id`'s.
+
+- `user_id`
+
A user ID provided by the client
-- `session_id` \
+
+- `session_id`
+
An optional session ID provided by the client
diff --git a/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java b/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java
index 3b451a4..8ca10a0 100644
--- a/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java
+++ b/src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java
@@ -95,7 +95,7 @@ public List> getSettings() {
settings.add(Setting.intSetting(SettingsConstants.VERSION_SETTING, 1, -1, Integer.MAX_VALUE, Setting.Property.IndexScope));
settings.add(Setting.simpleString(SettingsConstants.INDEX, "", Setting.Property.IndexScope));
- settings.add(Setting.simpleString(SettingsConstants.ID_FIELD, "", Setting.Property.IndexScope));
+ settings.add(Setting.simpleString(SettingsConstants.object_id, "", Setting.Property.IndexScope));
return settings;
diff --git a/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java b/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java
index f6f367b..c79879f 100644
--- a/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java
+++ b/src/main/java/com/o19s/ubi/action/UserBehaviorInsightsActionFilter.java
@@ -103,9 +103,9 @@ public void onResponse(Response response) {
if(!"".equals(storeName)) {
final String index = getStoreSettings(storeName, SettingsConstants.INDEX);
- final String idField = getStoreSettings(storeName, SettingsConstants.ID_FIELD);
+ final String idField = getStoreSettings(storeName, SettingsConstants.object_id);
- LOGGER.debug("Using id_field [{}] of index [{}] for UBI query.", idField, index);
+ LOGGER.debug("Using object_id [{}] of index [{}] for UBI query.", idField, index);
// Only consider this search if the index being searched matches the store's index setting.
if (Arrays.asList(searchRequest.indices()).contains(index)) {
@@ -124,7 +124,7 @@ public void onResponse(Response response) {
if (idField == null || "".equals(idField) || idField.equals("null")) {
- // Use the _id since there is no id_field setting for this index.
+ // Use the _id since there is no object_id setting for this index.
queryResponseHitIds.add(String.valueOf(hit.docId()));
} else {
@@ -240,7 +240,7 @@ private void persistQuery(final String storeName, final QueryRequest queryReques
source.put("query_id", queryRequest.getQueryId());
source.put("query", queryRequest.getQuery());
source.put("query_response_id", queryResponse.getQueryResponseId());
- source.put("query_response_hit_ids", queryResponse.getQueryResponseHitIds());
+ source.put("query_response_objects_ids", queryResponse.getQueryResponseHitIds());
source.put("user_id", queryRequest.getUserId());
source.put("session_id", queryRequest.getSessionId());
diff --git a/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java b/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java
index 200bb7a..6ad3d2e 100644
--- a/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java
+++ b/src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java
@@ -160,7 +160,7 @@ private Map buildQueryRequestMap(final QueryRequest queryRequest
source.put("query_id", queryRequest.getQueryId());
source.put("query", queryRequest.getQuery());
source.put("query_response_id", queryRequest.getQueryResponse().getQueryResponseId());
- source.put("query_response_hit_ids", queryRequest.getQueryResponse().getQueryResponseHitIds());
+ source.put("query_response_objects_ids", queryRequest.getQueryResponse().getQueryResponseHitIds());
source.put("user_id", queryRequest.getUserId());
source.put("session_id", queryRequest.getSessionId());
diff --git a/src/main/java/com/o19s/ubi/model/SettingsConstants.java b/src/main/java/com/o19s/ubi/model/SettingsConstants.java
index 469d830..8ae2b2b 100644
--- a/src/main/java/com/o19s/ubi/model/SettingsConstants.java
+++ b/src/main/java/com/o19s/ubi/model/SettingsConstants.java
@@ -26,6 +26,6 @@ public class SettingsConstants {
/**
* The field in an index's mapping that will be used as the unique identifier for a query result item.
*/
- public static final String ID_FIELD = "index.ubi.id_field";
+ public static final String object_id = "index.ubi.object_id";
}
diff --git a/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java b/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java
index 90758c7..f94c539 100644
--- a/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java
+++ b/src/main/java/com/o19s/ubi/rest/UserBehaviorInsightsRestHandler.java
@@ -84,7 +84,7 @@ protected RestChannelConsumer prepareRequest(RestRequest restRequest, NodeClient
final String storeName = restRequest.param("store");
final String index = restRequest.param("index");
- final String idField = restRequest.param("id_field");
+ final String idField = restRequest.param("object_id");
LOGGER.info("Received PUT for store {}", storeName);
@@ -191,7 +191,7 @@ private RestChannelConsumer create(final NodeClient nodeClient, final String sto
.put(IndexMetadata.INDEX_AUTO_EXPAND_REPLICAS_SETTING.getKey(), "0-2")
.put(IndexMetadata.SETTING_PRIORITY, Integer.MAX_VALUE)
.put(SettingsConstants.INDEX, index)
- .put(SettingsConstants.ID_FIELD, idField)
+ .put(SettingsConstants.object_id, idField)
.put(SettingsConstants.VERSION_SETTING, VERSION)
.build();
diff --git a/src/main/resources/events-mapping.json b/src/main/resources/events-mapping.json
index ac7e1e1..07a1caf 100644
--- a/src/main/resources/events-mapping.json
+++ b/src/main/resources/events-mapping.json
@@ -31,8 +31,8 @@
},
"object": {
"properties": {
- "catalog_id": { "type": "keyword" },
- "object_id": { "type": "keyword", "ignore_above": 256 },
+ "internal_id": { "type": "keyword", "ignore_above": 256 },
+ "object_id": { "type": "keyword" },
"object_type": { "type": "keyword", "ignore_above": 100 },
"transaction_id": { "type": "keyword", "ignore_above": 100 },
"name": { "type": "keyword", "ignore_above": 256 },
diff --git a/src/main/resources/queries-mapping.json b/src/main/resources/queries-mapping.json
index dc5fc22..e69cdbb 100644
--- a/src/main/resources/queries-mapping.json
+++ b/src/main/resources/queries-mapping.json
@@ -9,7 +9,7 @@
"type": "text"
},
"query_response_id": { "type": "keyword", "ignore_above": 100 },
- "query_response_hit_ids": { "type": "keyword" },
+ "query_response_objects_ids": { "type": "keyword" },
"user_id": { "type": "keyword", "ignore_above": 100 },
"session_id": { "type": "keyword", "ignore_above": 100 }
}
diff --git a/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json b/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json
index bfa909d..ea4514b 100644
--- a/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json
+++ b/src/yamlRestTest/resources/rest-api-spec/api/ubi.create_store.json
@@ -24,7 +24,7 @@
"type": "string",
"description": "The name of the index being searched"
},
- "id_field": {
+ "object_id": {
"required": false,
"type": "string",
"description": "The name of the field to use for the doc ID field"
diff --git a/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml b/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml
index aff029f..9bd5b34 100644
--- a/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml
+++ b/src/yamlRestTest/resources/rest-api-spec/test/_plugins.ubi/20_manage_stores.yml
@@ -6,7 +6,7 @@
ubi.create_store:
store: mystore
index: ecommerce
- id_field: name
+ object_id: name
- do:
cluster.health:
@@ -60,14 +60,14 @@
index: ""
---
-"Create a store without specifying an id_field":
-# A missing id_field is allowed - the doc ID will be used instead.
+"Create a store without specifying an object_id":
+# A missing object_id is allowed - the doc ID will be used instead.
- do:
ubi.create_store:
store: invalid_store
index: some_index
- id_field: ""
+ object_id: ""
---
"Delete a store that does not exist":
@@ -86,7 +86,7 @@
ubi.create_store:
store: mystore
index: ecommerce
- id_field: name
+ object_id: name
- do:
cluster.health:
@@ -97,6 +97,6 @@
ubi.create_store:
store: mystore
index: ecommerce
- id_field: name
+ object_id: name
- match: { status: initialized }
From 8490804c6ae3b653270d51c59b1670c2f7ae2a0e Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sat, 20 Apr 2024 12:00:10 -0700
Subject: [PATCH 05/12] aesthetics
---
documentation/schemas.md | 49 ++++++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 15 deletions(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 12aa1ad..9cebeb0 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -8,7 +8,27 @@
- **Ubi Logging Client**: is in charge of indexing user events, such as onClick, in the **event store** along with the `query_id` that links to the underlying, technical query DSL and the results' `object_id`'s.
*Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
-
+
+```mermaid
+graph TB
+style L fill:none
+subgraph L["`*Legend*`"]
+ subgraph ss[Standard Search]
+ direction LR
+ style ln1a fill:blue
+ ln1a[ ]--->ln1b[ ];
+ end
+ subgraph Ubi flow
+ direction LR
+ ln2a[ ].->|new|ln2b[ ];
+ style ln1c fill:red
+ ln1c[ ]-->|query_id|ln1d[ ];
+ end
+end
+linkStyle 0 stroke-width:2px,stroke:#0A1CCF
+
+```
+
```mermaid
%%{init: {
"flowchart": {"htmlLabels": false},
@@ -17,14 +37,14 @@
}%%
graph TB
- style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5
+style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5
subgraph OS[OpenSearch Cluster fa:fa-database]
E[( Ubi Events )]
Docs[(Document Index)] --3) DSL & object_id's--> Q[( Ubi Queries )];
- Q -."4) query_id".-> Docs ;
-
- end
- style *client-side* stroke-width:2px, stroke:#EC6363
+ Q -."4) query_id".-> Docs ;
+end
+
+style *client-side* stroke-width:2px, stroke:#EC6363
subgraph "`*client-side*`"
style User stroke-width:4px, stroke:#EC636
User["`*User*`" fa:fa-user]
@@ -32,30 +52,29 @@ subgraph "`*client-side*`"
Search
U
style App fill:#EC6363,opacity:.5
- subgraph App[UserApp fa:fa-store]
+ subgraph App[ UserApp fa:fa-store]
Search( Search Client )
U( Ubi Client )
end
- User--1) raw search string-->Search;
-
+ User--1) raw search string-->Search;
end
Search--2) search string-->Docs
-
-Docs -. 6) query_id & objects...->Search ;
+Docs -- 6) query_id & objects--->Search ;
Search --results--> User
Search-.7) query_id.->U;
-
-User--8) selects
- object_id:123-->U;
+User -.8) selects object_id:123.->U;
U-."9) index event:{query_id, onClick, object_id:123}".->E;
-linkStyle 3,0,5 stroke-width:2px,fill:none,stroke:#0A1CCF
+linkStyle 2,3,5,0 stroke-width:2px,fill:none,stroke:#0A1CCF
linkStyle 1,4,6,8 stroke-width:2px,fill:none,stroke:red
```
+linkStyle 2,4,6,8,10 stroke-width:2px,fill:none,stroke:red
+
+
Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.
[`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`.
From f4d9a41bd87a81b18215b1c60ddd9f7fd41ec0a3 Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sat, 20 Apr 2024 12:03:59 -0700
Subject: [PATCH 06/12] tweakage
---
documentation/schemas.md | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 9cebeb0..01cbf35 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -10,7 +10,7 @@
*Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
```mermaid
-graph TB
+graph LR
style L fill:none
subgraph L["`*Legend*`"]
subgraph ss[Standard Search]
@@ -18,14 +18,15 @@ subgraph L["`*Legend*`"]
style ln1a fill:blue
ln1a[ ]--->ln1b[ ];
end
- subgraph Ubi flow
+ subgraph Ubi data flow
direction LR
- ln2a[ ].->|new|ln2b[ ];
+ ln2a[ ].->|Ubi interaction|ln2b[ ];
style ln1c fill:red
- ln1c[ ]-->|query_id|ln1d[ ];
+ ln1c[ ]-->|query_id passing|ln1d[ ];
end
end
linkStyle 0 stroke-width:2px,stroke:#0A1CCF
+linkStyle 2 stroke-width:2px,stroke:red
```
From 0c9430260d24218440d09cbad7f05939170e0544 Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sat, 20 Apr 2024 12:25:12 -0700
Subject: [PATCH 07/12] meh
---
documentation/schemas.md | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 01cbf35..ab1f3f2 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -13,13 +13,16 @@
graph LR
style L fill:none
subgraph L["`*Legend*`"]
- subgraph ss[Standard Search]
+ style ss height:150px
+ subgraph ss["Standard Search"]
direction LR
+
style ln1a fill:blue
ln1a[ ]--->ln1b[ ];
end
subgraph Ubi data flow
direction LR
+
ln2a[ ].->|Ubi interaction|ln2b[ ];
style ln1c fill:red
ln1c[ ]-->|query_id passing|ln1d[ ];
From 6f59baec648eefec9d0a3b1c3fa2954d68ae3575 Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sat, 20 Apr 2024 13:34:48 -0700
Subject: [PATCH 08/12] reorder numbers
---
documentation/schemas.md | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index ab1f3f2..6e86e1f 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -11,6 +11,7 @@
```mermaid
graph LR
+
style L fill:none
subgraph L["`*Legend*`"]
style ss height:150px
@@ -64,10 +65,10 @@ subgraph "`*client-side*`"
end
Search--2) search string-->Docs
-Docs -- 6) query_id & objects--->Search ;
-Search --results--> User
-Search-.7) query_id.->U;
+Docs -- 5) query_id & objects--->Search ;
+Search-.6) query_id.->U;
User -.8) selects object_id:123.->U;
+Search --7) results--> User
U-."9) index event:{query_id, onClick, object_id:123}".->E;
linkStyle 2,3,5,0 stroke-width:2px,fill:none,stroke:#0A1CCF
From 2ae05e887fd5df95c20a7eea94aa66f539f7a02d Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sun, 21 Apr 2024 08:46:09 -0700
Subject: [PATCH 09/12] latest
---
documentation/schemas.md | 74 +++++++++++++++++++++-------------------
1 file changed, 38 insertions(+), 36 deletions(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 6e86e1f..8452010 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -1,18 +1,22 @@
-# Key UBI concepts
-## Ubi Roles
-- **User Behavior Insights** module: once activated, is in charge of indexing a user's queries and results in the **query store** with a unique [`query_id`](#query_id), and passing that `query_id` back to the search client.
-
-- **Search Client**: in charge of searching and recieving the `query_id` from **User Behavior Insights**. This `query_id` is then passed to the **Ubi Logging Client**
-
-- **Ubi Logging Client**: is in charge of indexing user events, such as onClick, in the **event store** along with the `query_id` that links to the underlying, technical query DSL and the results' `object_id`'s.
+# Key User Behavior Insights concepts
+**User Behavior Insights** (Ubi) **Logging** is really just a matter of linking and indexing queries, results and events within OpenSearch.
-*Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
+## Ubi Roles
+- **Search Client**: in charge of searching, and then recieving *objects* from some document index in OpenSearch.
+ (1, 2, *5* & 7, below)
+- **User Behavior Insights** module: once activated, manages the **Ubi Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client**.
+ (3, 4 & *5*, below)
+- The **Search Client**, if separate from the **Ubi Client**, forwards the [`query_id`](#query_id) to the **Ubi Client**.
+ *Note:* We break out the roles of *search* and *Ubi event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
+ (6, below)
+- The **Ubi Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights**
+- If the **Ubi Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), `onClick` and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*.
+ (8 & 9, below)
```mermaid
graph LR
-
-style L fill:none
+style L fill:none,stroke-dasharray: 5 5
subgraph L["`*Legend*`"]
style ss height:150px
subgraph ss["Standard Search"]
@@ -21,19 +25,17 @@ subgraph L["`*Legend*`"]
style ln1a fill:blue
ln1a[ ]--->ln1b[ ];
end
- subgraph Ubi data flow
+ subgraph ubi-leg["Ubi data flow"]
direction LR
- ln2a[ ].->|Ubi interaction|ln2b[ ];
+ ln2a[ ].->|"`**Ubi interaction**`"|ln2b[ ];
style ln1c fill:red
- ln1c[ ]-->|query_id passing|ln1d[ ];
+ ln1c[ ]-->|query_id flow|ln1d[ ];
end
end
linkStyle 0 stroke-width:2px,stroke:#0A1CCF
linkStyle 2 stroke-width:2px,stroke:red
-
```
-
```mermaid
%%{init: {
"flowchart": {"htmlLabels": false},
@@ -42,44 +44,44 @@ linkStyle 2 stroke-width:2px,stroke:red
}%%
graph TB
+User--1) raw search string-->Search;
+Search--2) search string-->Docs
style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5
subgraph OS[OpenSearch Cluster fa:fa-database]
- E[( Ubi Events )]
- Docs[(Document Index)] --3) DSL & object_id's--> Q[( Ubi Queries )];
- Q -."4) query_id".-> Docs ;
+ style E stroke-width:1px,stroke:red
+ E[( Ubi Events )]
+ style Docs stroke-width:1px,stroke:#0A1CCF
+ style Q stroke-width:1px,stroke:red
+ Docs[(Document Index)] -."3) {DSL...} & [object_id's,...]".-> Q[( Ubi Queries )];
+ Q -.4) query_id.-> Docs ;
end
+Docs -- "5) query_id & [objects,...]" --->Search ;
+Search-.6) query_id.->U;
+Search --7) [results, ...]--> User
+
style *client-side* stroke-width:2px, stroke:#EC6363
subgraph "`*client-side*`"
style User stroke-width:4px, stroke:#EC636
- User["`*User*`" fa:fa-user]
+ User["`**User**`" fa:fa-user]
App
- Search
+ Search
U
style App fill:#EC6363,opacity:.5
subgraph App[ UserApp fa:fa-store]
- Search( Search Client )
- U( Ubi Client )
+ style Search stroke-width:2px, stroke:#0A1CCF
+ Search( Search Client )
+ U( Ubi Client )
end
- User--1) raw search string-->Search;
end
-Search--2) search string-->Docs
-Docs -- 5) query_id & objects--->Search ;
-Search-.6) query_id.->U;
-User -.8) selects object_id:123.->U;
-Search --7) results--> User
-U-."9) index event:{query_id, onClick, object_id:123}".->E;
+User -.8) selects object_id:123.->U;
+U-."9) index event:{query_id, onClick, object_id:123}".->E;
-linkStyle 2,3,5,0 stroke-width:2px,fill:none,stroke:#0A1CCF
-linkStyle 1,4,6,8 stroke-width:2px,fill:none,stroke:red
+linkStyle 1,2,0,6 stroke-width:2px,fill:none,stroke:#0A1CCF
+linkStyle 3,4,5,8 stroke-width:2px,fill:none,stroke:red
```
-
-
-linkStyle 2,4,6,8,10 stroke-width:2px,fill:none,stroke:red
-
-
Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.
[`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`.
From 101a4bae7d5afb9963ff9db51056f39b3f09d03f Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sun, 21 Apr 2024 11:03:27 -0700
Subject: [PATCH 10/12] finalizing draft and linking to the main documentation
screen.
---
documentation/documentation.md | 48 +-------
documentation/schemas.md | 216 ++++++++++++++++-----------------
2 files changed, 112 insertions(+), 152 deletions(-)
diff --git a/documentation/documentation.md b/documentation/documentation.md
index 49b6a16..ad74672 100644
--- a/documentation/documentation.md
+++ b/documentation/documentation.md
@@ -71,50 +71,10 @@ The plugin has a concept of a "store", which is a logical collection of the even
index is used to store events, and the other index is for storing queries.
### OpenSearch Data Mappings
-
-#### Schema for events:
-
-The current event mappings file can be found [here](https://github.com/o19s/opensearch-ubi/blob/main/src/main/resources/events-mapping.json).
-
-**Primary fields include:**
-- `action_name` - (size 100) - any name you want to call your event
-- `timestamp` - unix epoch time. if not set, will be set by the plugin when the event is received
-- `user_id`. `session_id`, `page_id` - (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and pages
-- `query_id` - (size 100) - ID for some query. Note that it could be a unique search string, or it could represent a cluster of related searches (i.e.: *dress*, *red dress*, *long dress* could all have the same `query_id`). Either the client could control these, or the `query_id` could be retrieved from the API's response headers as it keeps track of queries on the node
-- `message_type` - (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`. Use to group `action_name` together.
-- `message` - (size 256) - optional text for the log entry
-
-**Other fields & data objects**
-- `event_attributes.object` - contains an associated JSONified data object (i.e. books, products, user info, etc) if there are any
- - `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object
- - `event_attributes.object.key_value` - points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.).
- **This field value should match the value in for the object's value in the `object_id` [below](#object_id) from the search store**
- It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users.
- - `event_attributes.object.object_type` - indicates the type/class of object
- - `event_attributes.object.description` - optional description of the object
- - `event_attributes.object.transaction_id` - optionally points to a unique id representing a successful transaction
- - `event_attributes.object.to_user_id` - optionally points to another user, if they are the recipient of this object
- - `event_attributes.object.object_detail` - optional data object/map of further data details
-- `event_attributes.position` - nested object to track user events to the location of the event origins
- - `event_attributes.position.ordinal` - tracks the nth item within a list that a user could select, click
- - `event_attributes.position.{x,y}` - tracks x and y values, that the client defines
- - `event_attributes.position.page_depth` - tracks page depth
- - `event_attributes.position.scroll_depth` - tracks scroll depth
- - `event_attributes.position.trail` - text field for tracking the path/trail that a user took to get to this location
-
-* Other mapped fields in the schema are intended to be optional placeholders for common attributes like `user_name`, `email`, `price`
-
-**the users can dynamically add any further fields to the event mapping
-
-#### Schema for queries:
-
-The current query mappings file can be found [here](https://github.com/o19s/opensearch-ubi/blob/main/src/main/resources/queries-mapping.json).
-
-- `timestamp` - A unix timestamp of when the query was received
-- `query_id` - A unique ID of the query provided by the client or generated automatically by the plugin
-- `query_response_id` - A unique ID for the collection of results for the query
-- `user_id` - A user ID provided by the client
-- `session_id` - An optional session ID provided by the client
+Ubi has 2 primary indices:
+- **UBi Queries** stores all queries and results.
+- **UBi Events** store that the Ubi client writes events to.
+*Please follow the [schema deep dive](./schemas.md) to understand how these two indices make Ubi into a causal framework for search.*
## Plugin API
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 8452010..56c8481 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -1,19 +1,31 @@
# Key User Behavior Insights concepts
**User Behavior Insights** (Ubi) **Logging** is really just a matter of linking and indexing queries, results and events within OpenSearch.
+## Key ID's
+Ubi is not functional unless the links between the following are consistently maintained within your Ubi-enabled application:
+
+- [`user_id`](#user_id) represents a unique user.
+- [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as *epc*, *isbn*, *ssn*, *handle*, etc.
+- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned.
+- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action was taken (or not) with this `object_id`
+
+To summarize: the `query_id` signals the beginning of a `user_id`'s *Search Journey*, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with.
## Ubi Roles
- **Search Client**: in charge of searching, and then recieving *objects* from some document index in OpenSearch.
(1, 2, *5* & 7, below)
-- **User Behavior Insights** module: once activated, manages the **Ubi Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client**.
+- **User Behavior Insights** module: once activated, manages the **Ubi Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client** so that events can be linked to this query.
(3, 4 & *5*, below)
-- The **Search Client**, if separate from the **Ubi Client**, forwards the [`query_id`](#query_id) to the **Ubi Client**.
+- **objects**: are whatever items the user is searching for with the queries. Activating Ubi involves mapping your real-world objects (via its *isbn*, etc...) to the [`object_id`](#object_id) fields in the schemas below.
+- The **Search Client**, if separate from the **Ubi Client**, forwards the indexed [`query_id`](#query_id) to the **Ubi Client**.
*Note:* We break out the roles of *search* and *Ubi event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
(6, below)
-- The **Ubi Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights**
-- If the **Ubi Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), `onClick` and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*.
+- The **Ubi Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights** and passed back to the **Ubi Client**
+- If the **Ubi Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), *onClick* [`action_name`](#action_name) and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*.
(8 & 9, below)
+
+
```mermaid
graph LR
style L fill:none,stroke-dasharray: 5 5
@@ -56,21 +68,22 @@ subgraph OS[OpenSearch Cluster fa:fa-database]
Q -.4) query_id.-> Docs ;
end
-Docs -- "5) query_id & [objects,...]" --->Search ;
+Docs -- "5) return both query_id & [objects,...]" --->Search ;
Search-.6) query_id.->U;
Search --7) [results, ...]--> User
-style *client-side* stroke-width:2px, stroke:#EC6363
+style *client-side* stroke-width:1px, stroke:#D35400
subgraph "`*client-side*`"
style User stroke-width:4px, stroke:#EC636
User["`**User**`" fa:fa-user]
App
Search
U
- style App fill:#EC6363,opacity:.5
+ style App fill:#D35400,opacity:.35, stroke:#0A1CCF, stroke-width:2px
subgraph App[ UserApp fa:fa-store]
style Search stroke-width:2px, stroke:#0A1CCF
Search( Search Client )
+ style U stroke-width:1px,stroke:red
U( Ubi Client )
end
end
@@ -82,50 +95,54 @@ linkStyle 1,2,0,6 stroke-width:2px,fill:none,stroke:#0A1CCF
linkStyle 3,4,5,8 stroke-width:2px,fill:none,stroke:red
```
-Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.
+## Ubi Stores
+There are 2 separate stores for Ubi:
+### 1) **Ubi Queries**
+All underlying query information and results ([`object_id`](#object_id)'s) are stored in the **Ubi Queries** store, and remains largely invisible in the background.
+The only obvious difference will be in the `ubi` stanze of the json response, *which could cause index bloat if one forgets that this is enabled*.
-[`user_id`](#user_id) represents a user. When UBI is active, any query that this user does, will generate a new `query_id` for this `user_id`.
+**Ubi Queries** [schema](../src/main/resources/queries-mapping.json):
+Since Ubi manages the **Ubi Queries** store, the developer should never have to write directly to this store (except for importing data).
-The purpose of the [`query_id`](#query_id)'s help link the user's raw query string to the results, as well as any subsequent action that the UBI client logs.
-When UBI is turned on, a *search client* will get a `query_id` back from OpenSearch, and is passed to the UBI client. The UBI client then associates each subsequent event with this query until it receives a new query_id.
-
-[`action_name`](#action_name) says what the name of the event is. It can be any name, such as *login*, *logout*, *save*, *post*, *add_to_cart*...
-
- [`event_attributes`](#event_attributes)'s is where any relevant information about the event can be stored.
- The two primary, predefined objects in the attributes are [`event_attributes.position`](#position), which contains
- information on what part of the application the user is interacting with,
- and [`event_attributes.object`](#object), which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..).
-
-The `object` structure has two ways to refer to the object:
-- `event_attributes.object.object_id` is the unique id that OpenSearch uses internally to index the object, think the `_id` field in the indices.
-- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog*
+- `timestamp`
+ A unix timestamp of when the query was received
- Therefore, the `query_id` signals the beginning of a user's *Search Journey*,
-`action_name` tells us how the user is interacting with the query results within the application,
-and `event_attributes.object` is referring to the precise query result that the user interacts with.
+- `query_id`
+ A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.
-### OpenSearch Data Mappings
+- `query_response_objects_ids`
+ This is an array of the `object_id`'s.
-#### Schema for events:
+- `user_id`
+ A user ID provided by the client
-The current event mappings file can be found [here](../src/main/resources/events-mapping.json).
+- `session_id`
+ An optional session ID provided by the client
-**Primary fields include:**
+### 2) **Ubi Events**
+This is the event store that the client side directly indexes events to, linking the event [`action_name`](#action_name), [`object_id`](#object_id)'s and [`query_id`](#query_id)'s together with any other important event information.
+Since this schema is dynamic, the developer can add any new fields and structures (such as *user* information, *geo-location* information, etc.) at index time that are not in the current **Ubi Events** [schema](../src/main/resources/events-mapping.json):
- `application`
- (size 100) - name of application tracking UBI events
+
+ (size 100) - name of the application tracking UBI events (e.g. *amazon-shop*, *ABC-microservice*)
- `action_name`
- (size 100) - any name you want to call your event
-- `timestamp`:
+
+ (size 100) - any name you want to call your event. For example, with *javascript* events, you could include `on_click`, `logon`, `add_to_cart`, `page_scroll`....
- Unix epoch time. If not set , will be set by the plugin when the event is received
- `query_id`
- (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated by the server.
+
+ (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by **Ubi Queries**.
- `user_id`. `session_id`, `source_id`
+
(size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event.
- The `user_id` must be consistent in both the `query` and `event` stores.
+ The `user_id` must be consistent in both the **Ubi Queries** and **Ubi Events** stores.
+
+- `timestamp`:
+ UTC-based, unix epoch time.
+
- `message_type`
(size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`.
@@ -133,89 +150,72 @@ The current event mappings file can be found [here](../src/main/resources/events
- `message`
- (size 256) - optional text for the log entry
+ (size 256) - optional text message for the log entry. For example, with a `message_type` of `INFO`, people might expect an informational or debug type text for this field, but a `message_type` of `QUERY`, we would expect the text to be more about what the user is searching on.
-**Other attribute fields & data objects**
-- `event_attributes.object`
-
- represents the search result object (i.e. books, products, user info, etc) if there are any
- - `event_attributes.object.internal_id` - points to a unique, internal, id representing and instance of that object
+- `event_attributes`'s structure is where any relevant information about the event can be stored.
+ There are two primary structures in the `event_attributes`:
+ - **`event_attributes.position`** - structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n*th object out of 10 results, ....
- - `event_attributes.object.object_id`
-
- points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.).
- **This field value should match the value in for the object's value in the `Object_id` [below](#object_id) from the search store**
- It is possible that the `object_id` and `internal_id` match if the same id is used both internally for indexing and externally for the users.
-
- - `event_attributes.object.object_type`
-
- indicates the type/class of object
+ - `event_attributes.position.ordinal`
+
+ tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`})
+
+ - `event_attributes.position.{x,y}`
+
+ tracks x and y values, that the client defines
+
+ - `event_attributes.position.page_depth`
+
+ tracks page depth of results
+
+ - `event_attributes.position.scroll_depth`
+
+ tracks scroll depth of page results
+
+ - `event_attributes.position.trail`
+
+ text field for tracking the path/trail that a user took to get to this location
+
+
+
+ - **`event_attributes.object`**, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..).
+ The `object` structure has two ways to refer to the object, with `object_id` being the id that links prior queries to this object:
+
+ - `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indices.
+ - `event_attributes.object.object_id`
+ is the id that a user could look up amd find the object instance within the **document index**. Examples include: *ssn*, *isbn*, *primary_ean*, etc.
+ Initializing Ubi requires mapping from the **Document Index**'s primary key to this `object_id`
+
+ - `event_attributes.object.object_type`
+
+ indicates the type/class of object
- - `event_attributes.object.description`
-
- optional description of the object
-
- - `event_attributes.object.transaction_id`
-
- optionally points to a unique id representing a successful transaction
-
- - `event_attributes.object.to_user_id`
-
- optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id`
- - `event_attributes.object.object_detail`
-
- optional text for further data object details
-
- - `event_attributes.object.object_detail.json`
+ - `event_attributes.object.description`
+
+ optional description of the object
+
+ - `event_attributes.object.transaction_id`
+
+ optionally points to a unique id representing a successful transaction
+
+ - `event_attributes.object.to_user_id`
+
+ optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id`
+ - `event_attributes.object.object_detail`
+
+ optional text for further data object details
+
+ - `event_attributes.object.object_detail.json`
+
+ if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large.
+- *dynamic fields*: any new fields by any other names in the json objects that one indexes will dynamically expand this schema to that use-case.
- if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large.
-- `event_attributes.position`
-
- nested object to track user events to the location of the event origins
- - `event_attributes.position.ordinal`
-
- tracks the nth item within a list that a user could select, click
- - `event_attributes.position.{x,y}`
-
- tracks x and y values, that the client defines
- - `event_attributes.position.page_depth`
-
- tracks page depth
- - `event_attributes.position.scroll_depth`
-
- tracks scroll depth
- - `event_attributes.position.trail`
-
- text field for tracking the path/trail that a user took to get to this location
-* Note the developers can add optional, dynamic fields like `user_name`, `email`, `price` per individual use-cases.
-#### Schema for queries:
-The current query mappings file can be found [here](../src/main/resources/queries-mapping.json).
-
-- `timestamp`
-
- A unix timestamp of when the query was received
-
-- `query_id`
-
- A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.
-
-- `query_response_objects_ids`
-
- This is an array of the `object_id`'s.
-
-- `user_id`
-
- A user ID provided by the client
-
-- `session_id`
-
- An optional session ID provided by the client
From 5f8ebad5b678b8401e55a66162a3efa0b3dc9cca Mon Sep 17 00:00:00 2001
From: RasonJ <145287540+RasonJ@users.noreply.github.com>
Date: Sun, 21 Apr 2024 11:08:12 -0700
Subject: [PATCH 11/12] meh 2
---
documentation/schemas.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index 56c8481..afdc84c 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -7,7 +7,7 @@ Ubi is not functional unless the links between the following are consistently ma
- [`user_id`](#user_id) represents a unique user.
- [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as *epc*, *isbn*, *ssn*, *handle*, etc.
- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned.
-- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action was taken (or not) with this `object_id`
+- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action (such as `click` or `add_to_cart`) was taken (or not) with this `object_id`.
To summarize: the `query_id` signals the beginning of a `user_id`'s *Search Journey*, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with.
From cb0297f0ab94913cd3f2b18796bd33f9058e8195 Mon Sep 17 00:00:00 2001
From: Eric Pugh
Date: Mon, 22 Apr 2024 20:08:55 -0400
Subject: [PATCH 12/12] doc changes from review
---
documentation/schemas.md | 33 +++++++++++++--------------------
1 file changed, 13 insertions(+), 20 deletions(-)
diff --git a/documentation/schemas.md b/documentation/schemas.md
index afdc84c..234e013 100644
--- a/documentation/schemas.md
+++ b/documentation/schemas.md
@@ -6,7 +6,7 @@ Ubi is not functional unless the links between the following are consistently ma
- [`user_id`](#user_id) represents a unique user.
- [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as *epc*, *isbn*, *ssn*, *handle*, etc.
-- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned.
+- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s that the query returned. \
- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact action (such as `click` or `add_to_cart`) was taken (or not) with this `object_id`.
To summarize: the `query_id` signals the beginning of a `user_id`'s *Search Journey*, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with.
@@ -109,15 +109,17 @@ Since Ubi manages the **Ubi Queries** store, the developer should never have to
- `query_id`
A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.
+
+ - `user_id`
+ A user ID provided by the client
+
+- `session_id`
+ An optional session ID provided by the client. _This is currently under review of if we keep this_.
- `query_response_objects_ids`
- This is an array of the `object_id`'s.
+ This is an array of the `object_id`'s. This *could* be the same id as the `_id` but is meant to be the externally valid id of document/item/product.
-- `user_id`
- A user ID provided by the client
-- `session_id`
- An optional session ID provided by the client
### 2) **Ubi Events**
This is the event store that the client side directly indexes events to, linking the event [`action_name`](#action_name), [`object_id`](#object_id)'s and [`query_id`](#query_id)'s together with any other important event information.
@@ -129,7 +131,7 @@ Since this schema is dynamic, the developer can add any new fields and structure
- `action_name`
- (size 100) - any name you want to call your event. For example, with *javascript* events, you could include `on_click`, `logon`, `add_to_cart`, `page_scroll`....
+ (size 100) - any name you want to call your event. For example, with *javascript* events, you could include `on_click`, `logon`, `add_to_cart`, `page_scroll`.... _This should be formalized. A list of standard ones and then custom ones._
- `query_id`
@@ -146,7 +148,7 @@ Since this schema is dynamic, the developer can add any new fields and structure
- `message_type`
(size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`.
- Can be used to group `action_name` together in logical bins.
+ Can be used to group `action_name` together in logical bins. _Thinking this should be backend logic in analysis_
- `message`
@@ -184,12 +186,12 @@ Since this schema is dynamic, the developer can add any new fields and structure
- `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indices.
- `event_attributes.object.object_id`
- is the id that a user could look up amd find the object instance within the **document index**. Examples include: *ssn*, *isbn*, *primary_ean*, etc.
+ is the id that a user could look up amd find the object instance within the **document corpus**. Examples include: *ssn*, *isbn*, *primary_ean*, etc. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`.
Initializing Ubi requires mapping from the **Document Index**'s primary key to this `object_id`
- `event_attributes.object.object_type`
- indicates the type/class of object
+ indicates the type/class of object.
- `event_attributes.object.description`
@@ -209,13 +211,4 @@ Since this schema is dynamic, the developer can add any new fields and structure
- `event_attributes.object.object_detail.json`
if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large.
-- *dynamic fields*: any new fields by any other names in the json objects that one indexes will dynamically expand this schema to that use-case.
-
-
-
-
-
-
-
-
-
+- *extensible fields*: any new fields by any other names in the json objects that one indexes will dynamically expand this schema to that use-case.