diff --git a/.chloggen/1482.yaml b/.chloggen/1482.yaml new file mode 100644 index 0000000000..a327bd7b4d --- /dev/null +++ b/.chloggen/1482.yaml @@ -0,0 +1,18 @@ +# Use this changelog template to create an entry for release notes. +# +# If your change doesn't affect end users you should instead start +# your pull request title with [chore] or use the "Skip Changelog" label. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: bug_fix +component: db +note: | + Fix telemetry for complex queries: + + - introduce the `db.query.summary` attribute to provide a concise, low-cardinality + representation of the query text. + - use `db.query.summary` as the span name and as a recommended attribute on metrics. + - avoid capturing `db.operation.name` and `db.collection.name` when the query + involves multiple operations or collections, to prevent ambiguity. + +issues: [521, 805,1159] diff --git a/docs/attributes-registry/db.md b/docs/attributes-registry/db.md index 4a6505b2b0..173e6f2655 100644 --- a/docs/attributes-registry/db.md +++ b/docs/attributes-registry/db.md @@ -17,22 +17,34 @@ This group defines the attributes used to describe telemetry in the context of databases. -| Attribute | Type | Description | Examples | Stability | -| -------------------------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- | -| `db.client.connection.pool.name` | string | The name of the connection pool; unique within the instrumented application. In case the connection pool implementation doesn't provide a name, instrumentation SHOULD use a combination of parameters that would make the name unique, for example, combining attributes `server.address`, `server.port`, and `db.namespace`, formatted as `server.address:server.port/db.namespace`. Instrumentations that generate connection pool name following different patterns SHOULD document it. | `myDataSource` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.client.connection.state` | string | The state of a connection in the pool | `idle` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.collection.name` | string | The name of a collection (table, container) within the database. [1] | `public.users`; `customers` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.namespace` | string | The name of the database, fully qualified within the server address and port. [2] | `customers`; `test.users` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.operation.batch.size` | int | The number of queries included in a batch operation. [3] | `2`; `3`; `4` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.operation.name` | string | The name of the operation or command being executed. [4] | `findAndModify`; `HMSET`; `SELECT` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.query.parameter.` | string | A query parameter used in `db.query.text`, with `` being the parameter name, and the attribute value being a string representation of the parameter value. [5] | `someval`; `55` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.query.text` | string | The database query being executed. [6] | `SELECT * FROM wuser_table where username = ?`; `SET mykey "WuValue"` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.response.status_code` | string | Database response status code. [7] | `102`; `ORA-17002`; `08P01`; `404` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.system` | string | The database management system (DBMS) product as identified by the client instrumentation. [8] | `other_sql`; `adabas`; `cache` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| Attribute | Type | Description | Examples | Stability | +| -------------------------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ---------------------------------------------------------------- | +| `db.client.connection.pool.name` | string | The name of the connection pool; unique within the instrumented application. In case the connection pool implementation doesn't provide a name, instrumentation SHOULD use a combination of parameters that would make the name unique, for example, combining attributes `server.address`, `server.port`, and `db.namespace`, formatted as `server.address:server.port/db.namespace`. Instrumentations that generate connection pool name following different patterns SHOULD document it. | `myDataSource` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.client.connection.state` | string | The state of a connection in the pool | `idle` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.collection.name` | string | The name of a collection (table, container) within the database. [1] | `public.users`; `customers` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.namespace` | string | The name of the database, fully qualified within the server address and port. [2] | `customers`; `test.users` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.operation.batch.size` | int | The number of queries included in a batch operation. [3] | `2`; `3`; `4` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.operation.name` | string | The name of the operation or command being executed. [4] | `findAndModify`; `HMSET`; `SELECT` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.query.parameter.` | string | A query parameter used in `db.query.text`, with `` being the parameter name, and the attribute value being a string representation of the parameter value. [5] | `someval`; `55` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.query.summary` | string | Low cardinality representation of a database query text. [6] | `SELECT wuser_table`; `INSERT shipping_details SELECT orders`; `get user by id` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.query.text` | string | The database query being executed. [7] | `SELECT * FROM wuser_table where username = ?`; `SET mykey ?` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.response.status_code` | string | Database response status code. [8] | `102`; `ORA-17002`; `08P01`; `404` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.system` | string | The database management system (DBMS) product as identified by the client instrumentation. [9] | `other_sql`; `adabas`; `cache` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -If the collection name is parsed from the query text, it SHOULD be the first collection name found in the query and it SHOULD match the value provided in the query text including any schema and database name prefix. -For batch operations, if the individual operations are known to have the same collection name then that collection name SHOULD be used, otherwise `db.collection.name` SHOULD NOT be captured. + +A single database query may involve multiple collections. + +If the collection name is parsed from the query text, it SHOULD only be captured for queries that +contain a single collection and it SHOULD match the value provided in +the query text including any schema and database name prefix. + +For batch operations, if the individual operations are known to have the same collection name +then that collection name SHOULD be used. + +If the operation or query involves multiple collections, `db.collection.name` +SHOULD NOT be captured. + This attribute has stability level RELEASE CANDIDATE. **[2]:** If a database system has multiple namespace components, they SHOULD be concatenated (potentially using database system specific conventions) from most general to most specific namespace component, and more specific namespaces SHOULD NOT be captured without the more general namespaces, to ensure that "startswith" queries for the more general namespaces will be valid. @@ -44,7 +56,7 @@ This attribute has stability level RELEASE CANDIDATE. This attribute has stability level RELEASE CANDIDATE. **[4]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -If the operation name is parsed from the query text, it SHOULD be the first operation name found in the query. +A single database query may involve multiple operations. If the operation name is parsed from the query text, it SHOULD only be captured for queries that contain a single operation or when the operation name describing the whole query is available by other means (such as SQL comments). For batch operations, if the individual operations are known to have the same operation name then that operation name SHOULD be used prepended by `BATCH `, otherwise `db.operation.name` SHOULD be `BATCH` or some other database system specific term if more applicable. This attribute has stability level RELEASE CANDIDATE. @@ -52,16 +64,20 @@ This attribute has stability level RELEASE CANDIDATE. If a parameter has no name and instead is referenced only by index, then `` SHOULD be the 0-based index. This attribute has stability level RELEASE CANDIDATE. -**[6]:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). +**[6]:** `db.query.summary` provides static summary of the query text. It describes a class of database queries and is useful as a grouping key, especially when analyzing telemetry for database calls involving complex queries. +Summary may be available to the instrumentation through SQL comment, instrumentation hooks, or other means. If it is not available, instrumentations that support query parsing SHOULD generate a summary following [Generating query summary](../../docs/database/database-spans.md#generating-a-summary-of-the-quey-text) section. +This attribute has stability level RELEASE CANDIDATE. + +**[7]:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). For batch operations, if the individual operations are known to have the same query text then that query text SHOULD be used, otherwise all of the individual query texts SHOULD be concatenated with separator `; ` or some other database system specific separator if more applicable. Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk. This attribute has stability level RELEASE CANDIDATE. -**[7]:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. +**[8]:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. This attribute has stability level RELEASE CANDIDATE. -**[8]:** The actual DBMS may differ from the one identified by the client. For example, when using PostgreSQL client libraries to connect to a CockroachDB, the `db.system` is set to `postgresql` based on the instrumentation's best knowledge. +**[9]:** The actual DBMS may differ from the one identified by the client. For example, when using PostgreSQL client libraries to connect to a CockroachDB, the `db.system` is set to `postgresql` based on the instrumentation's best knowledge. This attribute has stability level RELEASE CANDIDATE. `db.client.connection.state` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. @@ -206,9 +222,9 @@ This group defines attributes for Elasticsearch. | Attribute | Type | Description | Examples | Stability | | ----------------------------------- | ------ | -------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------- | | `db.elasticsearch.node.name` | string | Represents the human-readable identifier of the node/instance to which a request was routed. | `instance-0000000001` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.elasticsearch.path_parts.` | string | A dynamic value in the url path. [9] | `db.elasticsearch.path_parts.index=test-index`; `db.elasticsearch.path_parts.doc_id=123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.elasticsearch.path_parts.` | string | A dynamic value in the url path. [10] | `db.elasticsearch.path_parts.index=test-index`; `db.elasticsearch.path_parts.doc_id=123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -**[9]:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.`, where `` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names. +**[10]:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.`, where `` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names. ## Deprecated Database Attributes diff --git a/docs/database/database-spans.md b/docs/database/database-spans.md index a0941f0688..f52b6caced 100644 --- a/docs/database/database-spans.md +++ b/docs/database/database-spans.md @@ -14,6 +14,7 @@ linkTitle: Client Calls - [Common attributes](#common-attributes) - [Notes and well-known identifiers for `db.system`](#notes-and-well-known-identifiers-for-dbsystem) - [Sanitization of `db.query.text`](#sanitization-of-dbquerytext) +- [Generating a summary of the query text](#generating-a-summary-of-the-query-text) - [Semantic Conventions for specific database technologies](#semantic-conventions-for-specific-database-technologies)