Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#236) agent_state_summary: Count nodes without report as unhealthy #238

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bastelfreak
Copy link
Contributor

It's possible that a Puppet Agent was stopped or disabled and all old reports were garbage collected from PuppetDB. The node still exists in PuppetDB, but when checking for a report the timestamp is null:

puppet query nodes[certname,report_timestamp]{}
[
  {
    "certname": "pe.tim.local",
    "report_timestamp": "2024-09-30T13:21:17.042Z"
  },
  {
    "certname": "pe2.tim.local",
    "report_timestamp": null
  }
]

Previously we always assumed that report_timestamp has a valid timestamp. With this patch we explicitly validate the timestamp and count nodes withhout a timestamp as unhealthy.

Now with the fix:

puppet plan run pe_status_check::agent_state_summary --environment peadm log_healthy_nodes=true log_unhealthy_nodes=true
{
    "responsive": [
        "pe.tim.local",
        "pe2.tim.local"
    ],
    "healthy_counter": 0,
    "total_counter": 2,
    "unhealthy_counter": 2,
    "noop": [],
    "unhealthy": [
        "pe2.tim.local",
        "pe.tim.local"
    ],
    "healthy": [],
    "changed": [
        "pe.tim.local"
    ],
    "no_report": [
        "pe.tim.local"
    ],
    "corrective_changes": [],
    "used_cached_catalog": [
        "pe2.tim.local"
    ],
    "unresponsive": [],
    "failed": []
}

Please check off the steps below as you complete each step

  • Put the Jira ticket or Github issue number in parentheses in the Title e.g. (SUP-XXXX) Add Super Duper State Check
  • Update the Jira ticket status to Ready for Review if there is one
  • Review any CI failures and fix issues

@bastelfreak bastelfreak requested a review from a team as a code owner September 30, 2024 14:16
@bastelfreak
Copy link
Contributor Author

pe2.tim.local is listed here as used_cached_catalog. That's another bug, fixed in #237

…nhealthy

It's possible that a Puppet Agent was stopped or disabled and all old
reports were garbage collected from PuppetDB. The node still exists in
PuppetDB, but when checking for a report the timestamp is null:

```
puppet query nodes[certname,report_timestamp]{}
```

```json
[
  {
    "certname": "pe.tim.local",
    "report_timestamp": "2024-09-30T13:21:17.042Z"
  },
  {
    "certname": "pe2.tim.local",
    "report_timestamp": null
  }
]
```

Previously we always assumed that `report_timestamp` has a valid
timestamp. With this patch we explicitly validate the timestamp and
count nodes withhout a timestamp as unhealthy.

Now with the fix:

```
puppet plan run pe_status_check::agent_state_summary --environment peadm log_healthy_nodes=true log_unhealthy_nodes=true
```

```json
{
    "responsive": [
        "pe.tim.local",
        "pe2.tim.local"
    ],
    "healthy_counter": 0,
    "total_counter": 2,
    "unhealthy_counter": 2,
    "noop": [],
    "unhealthy": [
        "pe2.tim.local",
        "pe.tim.local"
    ],
    "healthy": [],
    "changed": [
        "pe.tim.local"
    ],
    "no_report": [
        "pe.tim.local"
    ],
    "corrective_changes": [],
    "used_cached_catalog": [
        "pe2.tim.local"
    ],
    "unresponsive": [],
    "failed": []
}
```
@taikaa
Copy link
Contributor

taikaa commented Oct 28, 2024

@bastelfreak apologies for the delay to review the PR. I tested the PR and no longer get the error. Thanks for adding this

@taikaa
Copy link
Contributor

taikaa commented Nov 1, 2024

@MartyEwings hello are these failed tests alright to merge this PR? Thank you!

@bastelfreak
Copy link
Contributor Author

Because nobody is reviewing this I raised support ticket #01302632.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants