From e73a88f5d44f272e42363a45e9e94550fdd08a5d Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 23 May 2024 11:08:13 -0700
Subject: [PATCH 01/15] Copy text from troubleshooting.md

---
 content/en/docs/collector/troubleshooting.md | 239 ++++++++++++++++++-
 1 file changed, 228 insertions(+), 11 deletions(-)

diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md
index 8278d00b678b..5b4bee04bc87 100644
--- a/content/en/docs/collector/troubleshooting.md
+++ b/content/en/docs/collector/troubleshooting.md
@@ -14,13 +14,6 @@ You can configure and use the Collector's own
 [internal telemetry](/docs/collector/internal-telemetry/) to monitor its
 performance.
 
-## Sending test data
-
-For certain types of issues, particularly verifying configuration and debugging
-network issues, it can be helpful to send a small amount of data to a collector
-configured to output to local logs. For details, see
-[Local exporters](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#local-exporters).
-
 ## Check available components in the Collector
 
 Use the following sub-command to list the available components in a Collector
@@ -120,6 +113,160 @@ extensions:
       extension: Beta
 ```
 
+## Sending test data
+
+For certain types of issues, particularly verifying configuration and debugging
+network issues, it can be helpful to send a small amount of data to a collector
+configured to output to local logs. 
+
+### Local exporters
+
+[Local exporters](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#general-information)
+can be configured to inspect the data being processed by the Collector.
+
+For live troubleshooting purposes consider leveraging the `debug` exporter,
+which can be used to confirm that data is being received, processed and exported
+by the Collector.
+
+```yaml
+receivers:
+  zipkin:
+exporters:
+  debug:
+service:
+  pipelines:
+    traces:
+      receivers: [zipkin]
+      processors: []
+      exporters: [debug]
+```
+
+Get a Zipkin payload to test. For example create a file called `trace.json` that
+contains:
+
+```json
+[
+  {
+    "traceId": "5982fe77008310cc80f1da5e10147519",
+    "parentId": "90394f6bcffb5d13",
+    "id": "67fae42571535f60",
+    "kind": "SERVER",
+    "name": "/m/n/2.6.1",
+    "timestamp": 1516781775726000,
+    "duration": 26000,
+    "localEndpoint": {
+      "serviceName": "api"
+    },
+    "remoteEndpoint": {
+      "serviceName": "apip"
+    },
+    "tags": {
+      "data.http_response_code": "201"
+    }
+  }
+]
+```
+
+With the Collector running, send this payload to the Collector. For example:
+
+```console
+$ curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json
+```
+
+You should see a log entry like the following from the Collector:
+
+```
+2023-09-07T09:57:43.468-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
+```
+
+You can also configure the `debug` exporter so the entire payload is printed:
+
+```yaml
+exporters:
+  debug:
+    verbosity: detailed
+```
+
+With the modified configuration if you re-run the test above the log output
+should look like:
+
+```
+2023-09-07T09:57:12.820-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
+2023-09-07T09:57:12.821-0700    info    ResourceSpans #0
+Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
+Resource attributes:
+     -> service.name: Str(telemetrygen)
+ScopeSpans #0
+ScopeSpans SchemaURL:
+InstrumentationScope telemetrygen
+Span #0
+    Trace ID       : 0c636f29e29816ea76e6a5b8cd6601cf
+    Parent ID      : 1a08eba9395c5243
+    ID             : 10cebe4b63d47cae
+    Name           : okey-dokey
+    Kind           : Internal
+    Start time     : 2023-09-07 16:57:12.045933 +0000 UTC
+    End time       : 2023-09-07 16:57:12.046058 +0000 UTC
+    Status code    : Unset
+    Status message :
+Attributes:
+     -> span.kind: Str(server)
+     -> net.peer.ip: Str(1.2.3.4)
+     -> peer.service: Str(telemetrygen)
+```
+
+## Extensions useful for troubleshooting
+
+### Health Check
+
+The
+[health_check](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md)
+extension, which by default is available on all interfaces on port `13133`, can
+be used to ensure the Collector is functioning properly.
+
+```yaml
+extensions:
+  health_check:
+service:
+  extensions: [health_check]
+```
+
+It returns a response like the following:
+
+```json
+{
+  "status": "Server available",
+  "upSince": "2020-11-11T04:12:31.6847174Z",
+  "uptime": "49.0132518s"
+}
+```
+
+### pprof
+
+The
+[pprof](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/pprofextension/README.md)
+extension, which by default is available locally on port `1777`, allows you to
+profile the Collector as it runs. This is an advanced use-case that should not
+be needed in most circumstances.
+
+### zPages
+
+The
+[zpages](https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/zpagesextension/README.md)
+extension, which if enabled is exposed locally on port `55679`, can be used to
+check receivers and exporters trace operations via `/debug/tracez`. `zpages` may
+contain error logs that the Collector does not emit.
+
+For containerized environments it may be desirable to expose this port on a
+public interface instead of just locally. This can be configured via the
+extensions configuration section. For example:
+
+```yaml
+extensions:
+  zpages:
+    endpoint: 0.0.0.0:55679
+```
+
 ## Checklist for debugging complex pipelines
 
 It can be difficult to isolate problems when telemetry flows through multiple
@@ -136,8 +283,78 @@ following:
 - How is the next hop configured?
 - Are there any network policies that prevent data from getting in or out?
 
-### More
+## Common Issues
+
+<!--- TODO: Add intro sentence. --->
+
+### Collector exit/restart
+
+The Collector may exit/restart because:
+
+- Memory pressure due to missing or misconfigured
+  [memory_limiter](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md)
+  processor.
+- Improperly sized for load.
+- Improperly configured (for example, a queue size configured higher than
+  available memory).
+- Infrastructure resource limits (for example Kubernetes).
+
+### Data being dropped
+
+Data may be dropped for a variety of reasons, but most commonly because of an:
+
+- Improperly sized Collector resulting in Collector being unable to process and
+  export the data as fast as it is received.
+- Exporter destination unavailable or accepting the data too slowly.
+
+To mitigate drops, it is highly recommended to configure the
+[batch](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md)
+processor. In addition, it may be necessary to configure the
+[queued retry options](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration)
+on enabled exporters.
+
+### Receiving data not working
+
+If you are unable to receive data then this is likely because either:
+
+- There is a network configuration issue
+- The receiver configuration is incorrect
+- The receiver is defined in the `receivers` section, but not enabled in any
+  `pipelines`
+- The client configuration is incorrect
+
+Check the Collector logs as well as `zpages` for potential issues.
+
+### Processing data not working
+
+Most processing issues are a result of either a misunderstanding of how the
+processor works or a misconfiguration of the processor.
+
+Examples of misunderstanding include:
+
+- The attributes processors only work for "tags" on spans. Span name is handled
+  by the span processor.
+- Processors for trace data (except tail sampling) work on individual spans.
+
+### Exporting data not working
+
+If you are unable to export to a destination then this is likely because either:
+
+- There is a network configuration issue
+- The exporter configuration is incorrect
+- The destination is unavailable
+
+Check the collector logs as well as `zpages` for potential issues.
+
+More often than not, exporting data does not work because of a network
+configuration issue. This could be due to a firewall, DNS, or proxy issue. Note
+that the Collector does have
+[proxy support](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#proxy-support).
+
+### Startup failing in Windows Docker containers (v0.90.1 and earlier)
 
-For detailed recommendations, including common problems, see
-[Troubleshooting](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md)
-from the Collector repository.
+The process may fail to start in a Windows Docker container with the following
+error: `The service process could not connect to the service controller`. In
+this case the `NO_WINDOWS_SERVICE=1` environment variable should be set to force
+the collector to be started as if it were running in an interactive terminal,
+without attempting to run as a Windows service.

From eb1ad6a40392236b2cf6227fbe1b3cd9158bbe2c Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 23 May 2024 11:34:08 -0700
Subject: [PATCH 02/15] Copy text from monitoring.md

---
 .../en/docs/collector/internal-telemetry.md   | 73 ++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index 3dfdad230978..37a77404d94e 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -133,7 +133,7 @@ journalctl | grep otelcol | grep Error
 
 {{% /tab %}} {{< /tabpane >}}
 
-## Types of internal observability
+## Types of internal telemetry
 
 The OpenTelemetry Collector aims to be a model of observable service by clearly
 exposing its own operational metrics. Additionally, it collects host resource
@@ -272,3 +272,74 @@ The Collector logs the following internal events:
 - Data dropping due to invalid data stops.
 - A crash is detected, differentiated from a clean stop. Crash data is included
   if available.
+
+## Use internal telemetry to monitor the Collector
+
+This section recommends best practices for alerting and monitoring the Collector
+using its own telemetry.
+
+### Critical monitoring
+
+#### Data loss
+
+Use rate of `otelcol_processor_dropped_spans > 0` and
+`otelcol_processor_dropped_metric_points > 0` to detect data loss, depending on
+the requirements set up a minimal time window before alerting, avoiding
+notifications for small losses that are not considered outages or within the
+desired reliability level.
+
+#### Low on CPU resources
+
+This depends on the CPU metrics available on the deployment, eg.:
+`kube_pod_container_resource_limits{resource="cpu", unit="core"}` for
+Kubernetes. Let's call it `available_cores` below. The idea here is to have an
+upper bound of the number of available cores, and the maximum expected ingestion
+rate considered safe, let's call it `safe_rate`, per core. This should trigger
+increase of resources/ instances (or raise an alert as appropriate) whenever
+`(actual_rate/available_cores) < safe_rate`.
+
+The `safe_rate` depends on the specific configuration being used. // TODO:
+Provide reference `safe_rate` for a few selected configurations.
+
+### Secondary monitoring
+
+#### Queue length
+
+Most exporters offer a
+[queue/retry mechanism](../exporter/exporterhelper/README.md) that is
+recommended as the retry mechanism for the Collector and as such should be used
+in any production deployment.
+
+The `otelcol_exporter_queue_capacity` indicates the capacity of the retry queue
+(in batches). The `otelcol_exporter_queue_size` indicates the current size of
+retry queue. So you can use these two metrics to check if the queue capacity is
+enough for your workload.
+
+The `otelcol_exporter_enqueue_failed_spans`,
+`otelcol_exporter_enqueue_failed_metric_points` and
+`otelcol_exporter_enqueue_failed_log_records` indicate the number of span/metric
+points/log records failed to be added to the sending queue. This may be cause by
+a queue full of unsettled elements, so you may need to decrease your sending
+rate or horizontally scale collectors.
+
+The queue/retry mechanism also supports logging for monitoring. Check the logs
+for messages like `"Dropping data because sending_queue is full"`.
+
+#### Receive failures
+
+Sustained rates of `otelcol_receiver_refused_spans` and
+`otelcol_receiver_refused_metric_points` indicate too many errors returned to
+clients. Depending on the deployment and the client’s resilience this may
+indicate data loss at the clients.
+
+Sustained rates of `otelcol_exporter_send_failed_spans` and
+`otelcol_exporter_send_failed_metric_points` indicate that the Collector is not
+able to export data as expected. It doesn't imply data loss per se since there
+could be retries but a high rate of failures could indicate issues with the
+network or backend receiving the data.
+
+### Data flow
+
+You can monitor data ingress with the `otelcol_receiver_accepted_spans` and
+`otelcol_receiver_accepted_metric_points` metrics and data egress with the
+`otecol_exporter_sent_spans` and `otelcol_exporter_sent_metric_points` metrics.

From 89b473a6f30168369099b93ac97cd7b386bcfc41 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 23 May 2024 12:49:39 -0700
Subject: [PATCH 03/15] Make copy edits to internal-telemetry.md

---
 .../en/docs/collector/internal-telemetry.md   | 80 ++++++++++---------
 1 file changed, 42 insertions(+), 38 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index 37a77404d94e..ae44c5f12039 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -275,52 +275,56 @@ The Collector logs the following internal events:
 
 ## Use internal telemetry to monitor the Collector
 
-This section recommends best practices for alerting and monitoring the Collector
-using its own telemetry.
+This section recommends best practices for monitoring the Collector using its
+own telemetry.
 
 ### Critical monitoring
 
 #### Data loss
 
-Use rate of `otelcol_processor_dropped_spans > 0` and
-`otelcol_processor_dropped_metric_points > 0` to detect data loss, depending on
-the requirements set up a minimal time window before alerting, avoiding
-notifications for small losses that are not considered outages or within the
-desired reliability level.
+Use the rate of `otelcol_processor_dropped_spans > 0` and
+`otelcol_processor_dropped_metric_points > 0` to detect data loss. Depending on
+your project's requirements, set up a minimal time window before alerting begins
+to avoid notifications for small losses that are within the desired reliability
+range and not considered outages.
 
-#### Low on CPU resources
+#### Low CPU resources
 
-This depends on the CPU metrics available on the deployment, eg.:
-`kube_pod_container_resource_limits{resource="cpu", unit="core"}` for
-Kubernetes. Let's call it `available_cores` below. The idea here is to have an
-upper bound of the number of available cores, and the maximum expected ingestion
-rate considered safe, let's call it `safe_rate`, per core. This should trigger
-increase of resources/ instances (or raise an alert as appropriate) whenever
-`(actual_rate/available_cores) < safe_rate`.
+To make sure your Collector is using CPU resources safely during data ingestion,
+you need to set:
 
-The `safe_rate` depends on the specific configuration being used. // TODO:
-Provide reference `safe_rate` for a few selected configurations.
+- An upper bound on the number of `available_cores`. The metric that tracks
+  `available_cores` is dependent on your deployment. For example, a Kubernetes
+  deployment offers the
+  `kube_pod_container_resource_limits{resource="cpu", unit="core"}` metric.
+- The maximum ingestion rate per core that is considered safe (`safe_rate`). The
+  `safe_rate` depends on the specific configuration you use.
+
+When `(actual_rate/available_cores) < safe_rate`, an alert should be raised and
+an increase in resources or instances should be triggered, as appropriate.
 
 ### Secondary monitoring
 
 #### Queue length
 
 Most exporters offer a
-[queue/retry mechanism](../exporter/exporterhelper/README.md) that is
-recommended as the retry mechanism for the Collector and as such should be used
-in any production deployment.
-
-The `otelcol_exporter_queue_capacity` indicates the capacity of the retry queue
-(in batches). The `otelcol_exporter_queue_size` indicates the current size of
-retry queue. So you can use these two metrics to check if the queue capacity is
-enough for your workload.
-
-The `otelcol_exporter_enqueue_failed_spans`,
-`otelcol_exporter_enqueue_failed_metric_points` and
-`otelcol_exporter_enqueue_failed_log_records` indicate the number of span/metric
-points/log records failed to be added to the sending queue. This may be cause by
-a queue full of unsettled elements, so you may need to decrease your sending
-rate or horizontally scale collectors.
+[queue/retry mechanism](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md)
+that is recommended for use in any production deployment of the Collector.
+
+The `otelcol_exporter_queue_capacity` metric indicates the capacity, in batches,
+of the retry queue. The `otelcol_exporter_queue_size` metric indicates the
+current size of the retry queue. Use these two metrics to check if the queue
+capacity can support your workload.
+
+Using the following three metrics, you can identify the number of spans/metric
+points/log records that failed to reach the sending queue:
+
+- `otelcol_exporter_enqueue_failed_spans`
+- `otelcol_exporter_enqueue_failed_metric_points`
+- `otelcol_exporter_enqueue_failed_log_records`
+
+These failures could be caused by a queue filled with unsettled elements. You
+might need to decrease your sending rate or horizontally scale Collectors.
 
 The queue/retry mechanism also supports logging for monitoring. Check the logs
 for messages like `"Dropping data because sending_queue is full"`.
@@ -328,15 +332,15 @@ for messages like `"Dropping data because sending_queue is full"`.
 #### Receive failures
 
 Sustained rates of `otelcol_receiver_refused_spans` and
-`otelcol_receiver_refused_metric_points` indicate too many errors returned to
-clients. Depending on the deployment and the client’s resilience this may
-indicate data loss at the clients.
+`otelcol_receiver_refused_metric_points` indicate that too many errors were
+returned to clients. Depending on the deployment and the clients' resilience,
+this might indicate clients' data loss.
 
 Sustained rates of `otelcol_exporter_send_failed_spans` and
 `otelcol_exporter_send_failed_metric_points` indicate that the Collector is not
-able to export data as expected. It doesn't imply data loss per se since there
-could be retries but a high rate of failures could indicate issues with the
-network or backend receiving the data.
+able to export data as expected. These metrics do not inherently imply data loss
+since there could be retries. But a high rate of failures could indicate issues
+with the network or backend receiving the data.
 
 ### Data flow
 

From fae47d4e477908a57c3a5faa809286a35997f2bc Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 23 May 2024 12:57:38 -0700
Subject: [PATCH 04/15] Make small word fixes

---
 content/en/docs/collector/internal-telemetry.md | 4 ++--
 content/en/docs/collector/troubleshooting.md    | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index ae44c5f12039..e9e4ba65edec 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -284,7 +284,7 @@ own telemetry.
 
 Use the rate of `otelcol_processor_dropped_spans > 0` and
 `otelcol_processor_dropped_metric_points > 0` to detect data loss. Depending on
-your project's requirements, set up a minimal time window before alerting begins
+your project's requirements, select a minimal time window before alerting begins
 to avoid notifications for small losses that are within the desired reliability
 range and not considered outages.
 
@@ -327,7 +327,7 @@ These failures could be caused by a queue filled with unsettled elements. You
 might need to decrease your sending rate or horizontally scale Collectors.
 
 The queue/retry mechanism also supports logging for monitoring. Check the logs
-for messages like `"Dropping data because sending_queue is full"`.
+for messages such as `"Dropping data because sending_queue is full"`.
 
 #### Receive failures
 
diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md
index 5b4bee04bc87..456b07cd0166 100644
--- a/content/en/docs/collector/troubleshooting.md
+++ b/content/en/docs/collector/troubleshooting.md
@@ -285,7 +285,7 @@ following:
 
 ## Common Issues
 
-<!--- TODO: Add intro sentence. --->
+This section covers how to identify and resolve common Collector issues.
 
 ### Collector exit/restart
 

From 667e0caf1d27bb82611a585417b405157c4ee338 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 23 May 2024 13:13:05 -0700
Subject: [PATCH 05/15] Make linter fixes

---
 content/en/docs/collector/troubleshooting.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md
index 456b07cd0166..65243c449c8c 100644
--- a/content/en/docs/collector/troubleshooting.md
+++ b/content/en/docs/collector/troubleshooting.md
@@ -117,7 +117,7 @@ extensions:
 
 For certain types of issues, particularly verifying configuration and debugging
 network issues, it can be helpful to send a small amount of data to a collector
-configured to output to local logs. 
+configured to output to local logs.
 
 ### Local exporters
 
@@ -169,13 +169,13 @@ contains:
 
 With the Collector running, send this payload to the Collector. For example:
 
-```console
-$ curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json
+```shell
+curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json
 ```
 
 You should see a log entry like the following from the Collector:
 
-```
+```shell
 2023-09-07T09:57:43.468-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
 ```
 
@@ -190,7 +190,7 @@ exporters:
 With the modified configuration if you re-run the test above the log output
 should look like:
 
-```
+```shell
 2023-09-07T09:57:12.820-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
 2023-09-07T09:57:12.821-0700    info    ResourceSpans #0
 Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0

From 9ec03f204e02e42bae5360a7817523b09aa2d6b4 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 23 May 2024 13:15:47 -0700
Subject: [PATCH 06/15] Add cSpell ignore words

---
 content/en/docs/collector/troubleshooting.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md
index 65243c449c8c..6d7baf303bc2 100644
--- a/content/en/docs/collector/troubleshooting.md
+++ b/content/en/docs/collector/troubleshooting.md
@@ -2,6 +2,8 @@
 title: Troubleshooting
 description: Recommendations for troubleshooting the collector
 weight: 25
+# prettier-ignore
+cSpell:ignore: pprof tracez zpages
 ---
 
 This page describes some options when troubleshooting the health or performance

From def14d69db83f19fb1b5020d38012b1b95816f23 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 23 May 2024 13:18:46 -0700
Subject: [PATCH 07/15] Make one more prettier fix

---
 content/en/docs/collector/troubleshooting.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md
index 6d7baf303bc2..2e4b86a5d7e2 100644
--- a/content/en/docs/collector/troubleshooting.md
+++ b/content/en/docs/collector/troubleshooting.md
@@ -2,7 +2,6 @@
 title: Troubleshooting
 description: Recommendations for troubleshooting the collector
 weight: 25
-# prettier-ignore
 cSpell:ignore: pprof tracez zpages
 ---
 

From 42d9ab46fd6a73f8d8d248a7a21599ba992e2142 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Wed, 29 May 2024 13:50:55 -0700
Subject: [PATCH 08/15] Revert CPU resources section

---
 .../en/docs/collector/internal-telemetry.md   | 26 +++++++++----------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index e9e4ba65edec..1616b9c9ea5a 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -288,20 +288,18 @@ your project's requirements, select a minimal time window before alerting begins
 to avoid notifications for small losses that are within the desired reliability
 range and not considered outages.
 
-#### Low CPU resources
-
-To make sure your Collector is using CPU resources safely during data ingestion,
-you need to set:
-
-- An upper bound on the number of `available_cores`. The metric that tracks
-  `available_cores` is dependent on your deployment. For example, a Kubernetes
-  deployment offers the
-  `kube_pod_container_resource_limits{resource="cpu", unit="core"}` metric.
-- The maximum ingestion rate per core that is considered safe (`safe_rate`). The
-  `safe_rate` depends on the specific configuration you use.
-
-When `(actual_rate/available_cores) < safe_rate`, an alert should be raised and
-an increase in resources or instances should be triggered, as appropriate.
+#### Low on CPU resources
+
+This depends on the CPU metrics available on the deployment, eg.:
+`kube_pod_container_resource_limits{resource="cpu", unit="core"}` for
+Kubernetes. Let's call it `available_cores` below. The idea here is to have an
+upper bound of the number of available cores, and the maximum expected ingestion
+rate considered safe, let's call it `safe_rate`, per core. This should trigger
+increase of resources/ instances (or raise an alert as appropriate) whenever
+`(actual_rate/available_cores) < safe_rate`.
+
+The `safe_rate` depends on the specific configuration being used. // TODO:
+Provide reference `safe_rate` for a few selected configurations.
 
 ### Secondary monitoring
 

From 9a9bbcd48a32732859c3f28fd475fe027447bbc9 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Wed, 29 May 2024 18:29:57 -0700
Subject: [PATCH 09/15] Copyedit the troubleshooting page

---
 content/en/docs/collector/troubleshooting.md | 402 ++++++++++---------
 1 file changed, 218 insertions(+), 184 deletions(-)

diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md
index 2e4b86a5d7e2..8f243e14c1b8 100644
--- a/content/en/docs/collector/troubleshooting.md
+++ b/content/en/docs/collector/troubleshooting.md
@@ -1,31 +1,135 @@
 ---
 title: Troubleshooting
-description: Recommendations for troubleshooting the collector
+description: Recommendations for troubleshooting the Collector
 weight: 25
 cSpell:ignore: pprof tracez zpages
 ---
 
-This page describes some options when troubleshooting the health or performance
-of the OpenTelemetry Collector. The Collector provides a variety of metrics,
-logs, and extensions for debugging issues.
+On this page, you can learn how to troubleshoot the health and performance of
+the OpenTelemetry Collector.
 
-## Internal telemetry
+## Troubleshooting tools
+
+The Collector provides a variety of metrics, logs, and extensions for debugging
+issues.
+
+### Internal telemetry
 
 You can configure and use the Collector's own
 [internal telemetry](/docs/collector/internal-telemetry/) to monitor its
 performance.
 
-## Check available components in the Collector
+### Local exporters
+
+For certain types of issues, such as configuration verification and network
+debugging, you can send a small amount of test data to a Collector configured to
+output to local logs. Using a
+[local exporter](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#general-information),
+you can inspect the data being processed by the Collector.
+
+For live troubleshooting, consider using the
+[`debug` exporter](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/debugexporter/README.md),
+which can confirm that the Collector is receiving, processing, and exporting
+data. For example:
+
+```yaml
+receivers:
+  zipkin:
+exporters:
+  debug:
+service:
+  pipelines:
+    traces:
+      receivers: [zipkin]
+      processors: []
+      exporters: [debug]
+```
+
+To begin testing, generate a Zipkin payload. For example, you can create a file
+called `trace.json` that contains:
+
+```json
+[
+  {
+    "traceId": "5982fe77008310cc80f1da5e10147519",
+    "parentId": "90394f6bcffb5d13",
+    "id": "67fae42571535f60",
+    "kind": "SERVER",
+    "name": "/m/n/2.6.1",
+    "timestamp": 1516781775726000,
+    "duration": 26000,
+    "localEndpoint": {
+      "serviceName": "api"
+    },
+    "remoteEndpoint": {
+      "serviceName": "apip"
+    },
+    "tags": {
+      "data.http_response_code": "201"
+    }
+  }
+]
+```
+
+With the Collector running, send this payload to the Collector:
+
+```shell
+curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json
+```
+
+You should see a log entry like the following:
+
+```shell
+2023-09-07T09:57:43.468-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
+```
+
+You can also configure the `debug` exporter so the entire payload is printed:
+
+```yaml
+exporters:
+  debug:
+    verbosity: detailed
+```
+
+If you re-run the previous test with the modified configuration, the log output
+looks like this:
+
+```shell
+2023-09-07T09:57:12.820-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
+2023-09-07T09:57:12.821-0700    info    ResourceSpans #0
+Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
+Resource attributes:
+     -> service.name: Str(telemetrygen)
+ScopeSpans #0
+ScopeSpans SchemaURL:
+InstrumentationScope telemetrygen
+Span #0
+    Trace ID       : 0c636f29e29816ea76e6a5b8cd6601cf
+    Parent ID      : 1a08eba9395c5243
+    ID             : 10cebe4b63d47cae
+    Name           : okey-dokey
+    Kind           : Internal
+    Start time     : 2023-09-07 16:57:12.045933 +0000 UTC
+    End time       : 2023-09-07 16:57:12.046058 +0000 UTC
+    Status code    : Unset
+    Status message :
+Attributes:
+     -> span.kind: Str(server)
+     -> net.peer.ip: Str(1.2.3.4)
+     -> peer.service: Str(telemetrygen)
+```
+
+### Check Collector components
 
 Use the following sub-command to list the available components in a Collector
 distribution, including their stability levels. Please note that the output
-format may change across versions.
+format might change across versions.
 
-```sh
+```shell
 otelcol components
 ```
 
-Sample output
+Sample output:
 
 ```yaml
 buildinfo:
@@ -114,116 +218,16 @@ extensions:
       extension: Beta
 ```
 
-## Sending test data
+### Extensions
 
-For certain types of issues, particularly verifying configuration and debugging
-network issues, it can be helpful to send a small amount of data to a collector
-configured to output to local logs.
+Here is a list of extensions you can enable for debugging the Collector.
 
-### Local exporters
-
-[Local exporters](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#general-information)
-can be configured to inspect the data being processed by the Collector.
-
-For live troubleshooting purposes consider leveraging the `debug` exporter,
-which can be used to confirm that data is being received, processed and exported
-by the Collector.
-
-```yaml
-receivers:
-  zipkin:
-exporters:
-  debug:
-service:
-  pipelines:
-    traces:
-      receivers: [zipkin]
-      processors: []
-      exporters: [debug]
-```
-
-Get a Zipkin payload to test. For example create a file called `trace.json` that
-contains:
-
-```json
-[
-  {
-    "traceId": "5982fe77008310cc80f1da5e10147519",
-    "parentId": "90394f6bcffb5d13",
-    "id": "67fae42571535f60",
-    "kind": "SERVER",
-    "name": "/m/n/2.6.1",
-    "timestamp": 1516781775726000,
-    "duration": 26000,
-    "localEndpoint": {
-      "serviceName": "api"
-    },
-    "remoteEndpoint": {
-      "serviceName": "apip"
-    },
-    "tags": {
-      "data.http_response_code": "201"
-    }
-  }
-]
-```
-
-With the Collector running, send this payload to the Collector. For example:
-
-```shell
-curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json
-```
-
-You should see a log entry like the following from the Collector:
-
-```shell
-2023-09-07T09:57:43.468-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
-```
-
-You can also configure the `debug` exporter so the entire payload is printed:
-
-```yaml
-exporters:
-  debug:
-    verbosity: detailed
-```
-
-With the modified configuration if you re-run the test above the log output
-should look like:
-
-```shell
-2023-09-07T09:57:12.820-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
-2023-09-07T09:57:12.821-0700    info    ResourceSpans #0
-Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
-Resource attributes:
-     -> service.name: Str(telemetrygen)
-ScopeSpans #0
-ScopeSpans SchemaURL:
-InstrumentationScope telemetrygen
-Span #0
-    Trace ID       : 0c636f29e29816ea76e6a5b8cd6601cf
-    Parent ID      : 1a08eba9395c5243
-    ID             : 10cebe4b63d47cae
-    Name           : okey-dokey
-    Kind           : Internal
-    Start time     : 2023-09-07 16:57:12.045933 +0000 UTC
-    End time       : 2023-09-07 16:57:12.046058 +0000 UTC
-    Status code    : Unset
-    Status message :
-Attributes:
-     -> span.kind: Str(server)
-     -> net.peer.ip: Str(1.2.3.4)
-     -> peer.service: Str(telemetrygen)
-```
-
-## Extensions useful for troubleshooting
-
-### Health Check
+#### Health Check
 
 The
-[health_check](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md)
-extension, which by default is available on all interfaces on port `13133`, can
-be used to ensure the Collector is functioning properly.
+[Health Check extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md),
+which by default is available on all interfaces on port `13133`, can be used to
+ensure the Collector is functioning properly. For example:
 
 ```yaml
 extensions:
@@ -242,25 +246,44 @@ It returns a response like the following:
 }
 ```
 
-### pprof
+{{% alert title="Caution" color="warning" %}}
+
+The optional `health_check` configuration setting, `check_collector_pipeline`,
+is not working as expected. Avoid using this feature. Efforts are underway to
+create a new version of the Health Check extension that relies on individual
+component statuses. The extension's configuration remains unchanged until this
+replacement is available.
+
+{{% /alert %}}
+
+#### Performance Profiler (pprof)
 
 The
-[pprof](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/pprofextension/README.md)
-extension, which by default is available locally on port `1777`, allows you to
-profile the Collector as it runs. This is an advanced use-case that should not
-be needed in most circumstances.
+[pprof extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/pprofextension/README.md),
+which is available locally on port `1777`, allows you to profile the Collector
+as it runs. This is an advanced use-case that should not be needed in most
+circumstances.
 
-### zPages
+#### zPages
 
 The
-[zpages](https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/zpagesextension/README.md)
-extension, which if enabled is exposed locally on port `55679`, can be used to
-check receivers and exporters trace operations via `/debug/tracez`. `zpages` may
-contain error logs that the Collector does not emit.
+[zPages extension](https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/zpagesextension/README.md),
+which is exposed locally on port `55679`, can be used to inspect live data from
+the Collector's receivers and exporters.
+
+The TraceZ page, exposed at `/debug/tracez`, is useful for debugging trace
+operations, such as:
+
+- Latency issues. Find the slow parts of an application.
+- Deadlocks and instrumentation problems. Identify running spans that don't end.
+- Errors. Determine what types of errors are occurring and where they happen.
 
-For containerized environments it may be desirable to expose this port on a
-public interface instead of just locally. This can be configured via the
-extensions configuration section. For example:
+Note that `zpages` might contain error logs that the Collector does not emit
+itself.
+
+For containerized environments, you might want to expose this port on a public
+interface instead of just locally. The `endpoint` can be configured using the
+`extensions` configuration section:
 
 ```yaml
 extensions:
@@ -271,91 +294,102 @@ extensions:
 ## Checklist for debugging complex pipelines
 
 It can be difficult to isolate problems when telemetry flows through multiple
-collectors and networks. For each "hop" of telemetry data through a collector or
-other component in your telemetry pipeline, it’s important to verify the
-following:
+Collectors and networks. For each "hop" of telemetry through a Collector or
+other component in your pipeline, it’s important to verify the following:
 
-- Are there error messages in the logs of the collector?
+- Are there error messages in the logs of the Collector?
 - How is the telemetry being ingested into this component?
-- How is the telemetry being modified (i.e. sampling, redacting) by this
-  component?
+- How is the telemetry being modified (for example, sampling or redacting) by
+  this component?
 - How is the telemetry being exported from this component?
 - What format is the telemetry in?
 - How is the next hop configured?
 - Are there any network policies that prevent data from getting in or out?
 
-## Common Issues
-
-This section covers how to identify and resolve common Collector issues.
+## Common Collector issues
 
-### Collector exit/restart
+This section covers how to resolve common Collector issues.
 
-The Collector may exit/restart because:
+### Collector is experiencing data issues
 
-- Memory pressure due to missing or misconfigured
-  [memory_limiter](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md)
-  processor.
-- Improperly sized for load.
-- Improperly configured (for example, a queue size configured higher than
-  available memory).
-- Infrastructure resource limits (for example Kubernetes).
+The Collector and its components might experience data issues.
 
-### Data being dropped
+#### Collector is dropping data
 
-Data may be dropped for a variety of reasons, but most commonly because of an:
+The Collector might drop data for a variety of reasons, but the most common are:
 
-- Improperly sized Collector resulting in Collector being unable to process and
+- The Collector is improperly sized, resulting in an inability to process and
   export the data as fast as it is received.
-- Exporter destination unavailable or accepting the data too slowly.
+- The exporter destination is unavailable or accepting the data too slowly.
 
-To mitigate drops, it is highly recommended to configure the
-[batch](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md)
-processor. In addition, it may be necessary to configure the
+To mitigate drops, configure the
+[`batch` processor](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md).
+In addition, it might be necessary to configure the
 [queued retry options](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration)
 on enabled exporters.
 
-### Receiving data not working
-
-If you are unable to receive data then this is likely because either:
+#### Collector is not receiving data
 
-- There is a network configuration issue
-- The receiver configuration is incorrect
-- The receiver is defined in the `receivers` section, but not enabled in any
-  `pipelines`
-- The client configuration is incorrect
+The Collector might not receive data for the following reasons:
 
-Check the Collector logs as well as `zpages` for potential issues.
+- A network configuration issue.
+- An incorrect receiver configuration.
+- An incorrect client configuration.
+- The receiver is defined in the `receivers` section but not enabled in any
+  `pipelines`.
 
-### Processing data not working
+Check the Collector's
+[logs](/docs/collector/internal-telemetry/#configure-internal-logs) as well as
+[zPages](https://github.com/open-telemetry/opentelemetry-collector/blob/main/extension/zpagesextension/README.md)
+for potential issues.
 
-Most processing issues are a result of either a misunderstanding of how the
-processor works or a misconfiguration of the processor.
+#### Collector is not processing data
 
-Examples of misunderstanding include:
+Most processing issues result from of a misunderstanding of how the processor
+works or a misconfiguration of the processor. For example:
 
-- The attributes processors only work for "tags" on spans. Span name is handled
-  by the span processor.
-- Processors for trace data (except tail sampling) work on individual spans.
+- The attributes processor works only for "tags" on spans. The span name is
+  handled by the span processor.
+- Processors for trace data (except tail sampling) work only on individual
+  spans.
 
-### Exporting data not working
+#### Collector is not exporting data
 
-If you are unable to export to a destination then this is likely because either:
+The Collector might not export data for the following reasons:
 
-- There is a network configuration issue
-- The exporter configuration is incorrect
-- The destination is unavailable
+- A network configuration issue.
+- An incorrect exporter configuration.
+- The destination is unavailable.
 
-Check the collector logs as well as `zpages` for potential issues.
+Check the Collector's
+[logs](/docs/collector/internal-telemetry/#configure-internal-logs) as well as
+[zPages](https://github.com/open-telemetry/opentelemetry-collector/blob/main/extension/zpagesextension/README.md)
+for potential issues.
 
-More often than not, exporting data does not work because of a network
-configuration issue. This could be due to a firewall, DNS, or proxy issue. Note
-that the Collector does have
+Exporting data often does not work because of a network configuration issue,
+such as a firewall, DNS, or proxy issue. Note that the Collector does have
 [proxy support](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#proxy-support).
 
-### Startup failing in Windows Docker containers (v0.90.1 and earlier)
+### Collector is experiencing control issues
+
+The Collector might experience failed startups or unexpected exits or restarts.
+
+#### Collector exits or restarts
+
+The Collector might exit or restart due to:
+
+- Memory pressure from a missing or misconfigured
+  [`memory_limiter` processor](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md).
+- Improper sizing for load.
+- Improper configuration. For example, a queue size configured higher than
+  available memory.
+- Infrastructure resource limits. For example, Kubernetes.
+
+#### Collector fails to start in Windows Docker containers
 
-The process may fail to start in a Windows Docker container with the following
-error: `The service process could not connect to the service controller`. In
-this case the `NO_WINDOWS_SERVICE=1` environment variable should be set to force
-the collector to be started as if it were running in an interactive terminal,
-without attempting to run as a Windows service.
+With v0.90.1 and earlier, the Collector might fail to start in a Windows Docker
+container, producing the error message
+`The service process could not connect to the service controller`. In this case,
+the `NO_WINDOWS_SERVICE=1` environment variable must be set to force the
+Collector to start as if it were running in an interactive terminal, without
+attempting to run as a Windows service.

From ee3aa1eac6f4af451860bddca124da89eb461531 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Wed, 29 May 2024 18:36:43 -0700
Subject: [PATCH 10/15] Make more text edits to internal telemetry page

---
 content/en/docs/collector/internal-telemetry.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index 1616b9c9ea5a..070dc0515223 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -284,7 +284,7 @@ own telemetry.
 
 Use the rate of `otelcol_processor_dropped_spans > 0` and
 `otelcol_processor_dropped_metric_points > 0` to detect data loss. Depending on
-your project's requirements, select a minimal time window before alerting begins
+your project's requirements, select a narrow time window before alerting begins
 to avoid notifications for small losses that are within the desired reliability
 range and not considered outages.
 
@@ -314,8 +314,8 @@ of the retry queue. The `otelcol_exporter_queue_size` metric indicates the
 current size of the retry queue. Use these two metrics to check if the queue
 capacity can support your workload.
 
-Using the following three metrics, you can identify the number of spans/metric
-points/log records that failed to reach the sending queue:
+Using the following three metrics, you can identify the number of spans, metric
+points, and log records that failed to reach the sending queue:
 
 - `otelcol_exporter_enqueue_failed_spans`
 - `otelcol_exporter_enqueue_failed_metric_points`
@@ -325,7 +325,7 @@ These failures could be caused by a queue filled with unsettled elements. You
 might need to decrease your sending rate or horizontally scale Collectors.
 
 The queue/retry mechanism also supports logging for monitoring. Check the logs
-for messages such as `"Dropping data because sending_queue is full"`.
+for messages such as `Dropping data because sending_queue is full`.
 
 #### Receive failures
 
@@ -340,7 +340,7 @@ able to export data as expected. These metrics do not inherently imply data loss
 since there could be retries. But a high rate of failures could indicate issues
 with the network or backend receiving the data.
 
-### Data flow
+#### Data flow
 
 You can monitor data ingress with the `otelcol_receiver_accepted_spans` and
 `otelcol_receiver_accepted_metric_points` metrics and data egress with the

From 5904d2160e5f4f6588929df44a7b94fa62bf62a6 Mon Sep 17 00:00:00 2001
From: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com>
Date: Mon, 3 Jun 2024 14:51:22 -0700
Subject: [PATCH 11/15] Apply suggestions from Fabrizio's review

Co-authored-by: Fabrizio Ferri-Benedetti <fferribenedetti@splunk.com>
---
 content/en/docs/collector/internal-telemetry.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index 070dc0515223..b1a95067ba94 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -292,7 +292,7 @@ range and not considered outages.
 
 This depends on the CPU metrics available on the deployment, eg.:
 `kube_pod_container_resource_limits{resource="cpu", unit="core"}` for
-Kubernetes. Let's call it `available_cores` below. The idea here is to have an
+Kubernetes. Let's call it `available_cores`. The idea here is to have an
 upper bound of the number of available cores, and the maximum expected ingestion
 rate considered safe, let's call it `safe_rate`, per core. This should trigger
 increase of resources/ instances (or raise an alert as appropriate) whenever
@@ -305,8 +305,8 @@ Provide reference `safe_rate` for a few selected configurations.
 
 #### Queue length
 
-Most exporters offer a
-[queue/retry mechanism](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md)
+Most exporters provide a
+[queue or retry mechanism](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md)
 that is recommended for use in any production deployment of the Collector.
 
 The `otelcol_exporter_queue_capacity` metric indicates the capacity, in batches,
@@ -324,8 +324,8 @@ points, and log records that failed to reach the sending queue:
 These failures could be caused by a queue filled with unsettled elements. You
 might need to decrease your sending rate or horizontally scale Collectors.
 
-The queue/retry mechanism also supports logging for monitoring. Check the logs
-for messages such as `Dropping data because sending_queue is full`.
+The queue or retry mechanism also supports logging for monitoring. Check the
+logs for messages such as `Dropping data because sending_queue is full`.
 
 #### Receive failures
 

From 9c244d938e4a095a3b156103771d5e7af75a8367 Mon Sep 17 00:00:00 2001
From: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com>
Date: Mon, 3 Jun 2024 21:56:37 +0000
Subject: [PATCH 12/15] Results from /fix:format

---
 content/en/docs/collector/internal-telemetry.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index ad33cc5d02e1..76d31157b105 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -292,9 +292,9 @@ range and not considered outages.
 
 This depends on the CPU metrics available on the deployment, eg.:
 `kube_pod_container_resource_limits{resource="cpu", unit="core"}` for
-Kubernetes. Let's call it `available_cores`. The idea here is to have an
-upper bound of the number of available cores, and the maximum expected ingestion
-rate considered safe, let's call it `safe_rate`, per core. This should trigger
+Kubernetes. Let's call it `available_cores`. The idea here is to have an upper
+bound of the number of available cores, and the maximum expected ingestion rate
+considered safe, let's call it `safe_rate`, per core. This should trigger
 increase of resources/ instances (or raise an alert as appropriate) whenever
 `(actual_rate/available_cores) < safe_rate`.
 

From aa21f3195338c9fa61a5964ac0ffab1c1edced84 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Tue, 4 Jun 2024 15:40:54 -0700
Subject: [PATCH 13/15] Make small wording and link fixes

---
 content/en/docs/collector/internal-telemetry.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index 76d31157b105..6ce1dee0a61b 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -5,10 +5,11 @@ weight: 25
 cSpell:ignore: alloc journalctl kube otecol pprof tracez underperforming zpages
 ---
 
-You can monitor the health of any OpenTelemetry Collector instance by checking
+You can inspect the health of any OpenTelemetry Collector instance by checking
 its own internal telemetry. Read on to learn about this telemetry and how to
-configure it to help you [troubleshoot](/docs/collector/troubleshooting/)
-Collector issues.
+configure it to help you
+[monitor](#use-internal-telemetry-to-monitor-the-collector) and
+[troubleshoot](/docs/collector/troubleshooting/) the Collector.
 
 ## Activate internal telemetry in the Collector
 
@@ -97,9 +98,9 @@ critical analysis.
 ### Configure internal logs
 
 Log output is found in `stderr`. You can configure logs in the config
-`service::telemetry::logs`. The [configuration
-options](https://github.com/open-telemetry/opentelemetry-collector/blob/v{{% param
-vers %}}/service/telemetry/config.go) are:
+`service::telemetry::logs`. The
+[configuration options](https://github.com/open-telemetry/opentelemetry-collector/blob/main/service/telemetry/config.go)
+are:
 
 | Field name             | Default value | Description                                                                                                                                                                                                                                                                                       |
 | ---------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

From 27bd47b581fd17b37caf908f0636ce4eb09439ca Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Tue, 11 Jun 2024 12:07:48 -0700
Subject: [PATCH 14/15] Remove Health Check extension section

---
 content/en/docs/collector/troubleshooting.md | 34 --------------------
 1 file changed, 34 deletions(-)

diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md
index 8f243e14c1b8..e48030b648fb 100644
--- a/content/en/docs/collector/troubleshooting.md
+++ b/content/en/docs/collector/troubleshooting.md
@@ -222,40 +222,6 @@ extensions:
 
 Here is a list of extensions you can enable for debugging the Collector.
 
-#### Health Check
-
-The
-[Health Check extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md),
-which by default is available on all interfaces on port `13133`, can be used to
-ensure the Collector is functioning properly. For example:
-
-```yaml
-extensions:
-  health_check:
-service:
-  extensions: [health_check]
-```
-
-It returns a response like the following:
-
-```json
-{
-  "status": "Server available",
-  "upSince": "2020-11-11T04:12:31.6847174Z",
-  "uptime": "49.0132518s"
-}
-```
-
-{{% alert title="Caution" color="warning" %}}
-
-The optional `health_check` configuration setting, `check_collector_pipeline`,
-is not working as expected. Avoid using this feature. Efforts are underway to
-create a new version of the Health Check extension that relies on individual
-component statuses. The extension's configuration remains unchanged until this
-replacement is available.
-
-{{% /alert %}}
-
 #### Performance Profiler (pprof)
 
 The

From 878c850a3ca8e845ae3f06f6cc1ccbf29e9cd709 Mon Sep 17 00:00:00 2001
From: tiffany76 <30397949+tiffany76@users.noreply.github.com>
Date: Thu, 13 Jun 2024 15:39:48 -0700
Subject: [PATCH 15/15] Remove CPU monitoring section

---
 content/en/docs/collector/internal-telemetry.md | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md
index 6ce1dee0a61b..b54a555eca02 100644
--- a/content/en/docs/collector/internal-telemetry.md
+++ b/content/en/docs/collector/internal-telemetry.md
@@ -289,19 +289,6 @@ your project's requirements, select a narrow time window before alerting begins
 to avoid notifications for small losses that are within the desired reliability
 range and not considered outages.
 
-#### Low on CPU resources
-
-This depends on the CPU metrics available on the deployment, eg.:
-`kube_pod_container_resource_limits{resource="cpu", unit="core"}` for
-Kubernetes. Let's call it `available_cores`. The idea here is to have an upper
-bound of the number of available cores, and the maximum expected ingestion rate
-considered safe, let's call it `safe_rate`, per core. This should trigger
-increase of resources/ instances (or raise an alert as appropriate) whenever
-`(actual_rate/available_cores) < safe_rate`.
-
-The `safe_rate` depends on the specific configuration being used. // TODO:
-Provide reference `safe_rate` for a few selected configurations.
-
 ### Secondary monitoring
 
 #### Queue length