#301 session/active time #305

yhuelf · 2022-03-03T14:39:45Z

No description provided.

This commit only gives the opportunity to write PG14 specific queries.

rjuju · 2022-03-03T14:56:27Z

Hi,

IMHO it's a good idea to mix instant and cumulated counters in the same service. I'm also unsure of how useful it's, so it's another argument to be a in a dedicated service.

Also, the "global_" prefix is quite misleading, maybe using "instance_" instead?

yhuelf · 2022-03-03T15:20:17Z

Hi Julien,

I also have a patch for a dedicated service, but @ioguix told me it would be better to include these new metrics in a existing one. I'll let the two of you debate this ;-)

And yes, the "_instance" prefix would certainly be better (the code was very much modelled on the commit_ratio service).

rjuju · 2022-03-03T15:26:21Z

Was it a private discussion or did I miss some activity?

In any case, agreed let's wait for him to chime in.

yhuelf · 2022-03-03T15:40:15Z

Yes it was a private discussion (sorry for the confusion). I've been given time to work on check_pgactivity, a few weeks spread into this year, and I asked him what I could do to get a foothold in this project. Other suggestions are welcomed :)

rjuju · 2022-03-03T15:45:40Z

Well, there are already a few bugs opened (https://github.com/OPMDG/check_pgactivity/issues?q=is%3Aopen+is%3Aissue+label%3Abug) or otherwise quite a lot of other enhancements ideas in the bug tracker. Or simply improve the tap tests :)

ioguix · 2022-03-03T16:12:45Z

Hi,

We reviewed some existing issues and spot the #301 that leads to the internal short discussion, and this PR.

When discussing it, I couldn't find any interesting alerting use case based on these metrics. Creating a service only for two little metrics seemed overkill as well.

On the other hand, we usually fetch more metrics than needed in other services for pure metrics. Adding two metrics to an existing service which is already related to the backends numbers was easier, seemed logical, coherent with other services and allow to add them without setting anything on the monitoring side.

Moreover, the metrics have different units and are not supposed to be on the same graph anyway.

Now, I really have no strong feelings and didn't review the patch yet.

rjuju · 2022-03-03T16:18:07Z

Well, as I said in my first answer I don't really see how helpful those metrics are. It becomes even worse on environment where you have connection pooling.

yhuelf · 2022-03-04T09:35:08Z

I guess it could be helpful to decide whether or not you should set up a connection pooling.

And maybe, once it has been set up, how efficient it is. If there's no activity at night, but pgBouncer maintains a pool of server connections, then a graph of these metrics over time could be useful (good active ratio during workday, bad average).

rjuju · 2022-03-04T09:48:21Z

I guess it could be helpful to decide whether or not you should set up a connection pooling.

I'm not sure it's really that helpful. Using a pooler is already a solution for specific problems that can already be identified without those info.

And maybe, once it has been set up, how efficient it is. If there's no activity at night, but pgBouncer maintains a pool of server connections, then a graph of these metrics over time could be useful (good active ratio during workday, bad average).

I'm also doubtful. It depends on your configuration as e.g. your pooler could aggresively drop idle connection or not.

If you really want to integrate it I still thing that having a dedicate check with a bit more thinking on what could be done with the info would be better. For instance, we could detect which how the busy time is distributed among the databases.

ioguix · 2022-03-04T14:23:00Z

I'm not sure it's really that helpful. Using a pooler is already a solution for specific problems that can already be identified without those info.

What are the other means are you thinking about? From check_pga perspective, I don't remember there's much, out of the backend_status check which gather only sampled data and nothing about new sessions.

One of the reason you might want to add a pooler is to lower the number of sessions per second and we have nothing about this currently (neither in current code base or in this PR).

If you really want to integrate it I still thing that having a dedicate check with a bit more thinking on what could be done with the info would be better. For instance, we could detect which how the busy time is distributed among the databases.

Thinking about the other new pg_stat_database fields, I can imagine some of them might actually be useful to monitor with the thresholds:

sessions: useful to compute the number of sessions/s
sessions_killed: useful to detect eg. a backend crash killing all other backends

Moreover, active_time, idle_in_transaction_time and session_time metrics might help to diagnose some issues after some application update if their shape change much more than expected. I can't imagine how a threshold might be useful on them though.

I believe the sessions_abandoned and sessions_fatal fields would rise way too many false positive, but htey might be useful as pure metrics as well.

So, indeed, this might qualify for a dedicate check if you agree with these certainly not exhausted use case.

rjuju · 2022-03-06T05:05:33Z

What are the other means are you thinking about? From check_pga perspective, I don't remember there's much, out of the backend_status check which gather only sampled data and nothing about new sessions.

Nothing in check_pgactivity indeed. But the need for a pooler is something really tied to your architecture, I don't think that's the job of monitoring to tell you that you need a pooler, as it's not its job to tell you that you need to do backups or setup replicas.

Now if we can have a new service that gives some useful information and can also help you notice that a pooler would be a good idea that's great, but it's not something that should really change, unlike other properties discussed here.

sessions_killed: useful to detect eg. a backend crash killing all other backends

Huh? This field shows controlled kill, like pg_terminate_backend() or a postmaster clean stop. If there's a crash that kills everything, you will lose all pg_stat_* data anyway.

Moreover, active_time, idle_in_transaction_time and session_time metrics might help to diagnose some issues after some application update if their shape change much more than expected. I can't imagine how a threshold might be useful on them though.

Agreed, especially idle_in_transaction_time. This one would deserve threshold.

So, indeed, this might qualify for a dedicate check if you agree with these certainly not exhausted use case.

Agreed.

yhuelf · 2022-03-08T17:09:33Z

OK I will submit a new PR during my second week of check_pgactivity dedicated week (mid-april).

ioguix · 2022-05-03T08:07:47Z

PR #308 now exists, so we can close this one and move on.

yhuelf added 2 commits March 3, 2022 15:37

add PG_VERSION_140 global var

2bf05fd

This commit only gives the opportunity to write PG14 specific queries.

Add session/active time to backends' perfdata (v14+)

9904f15

ioguix closed this May 3, 2022

ioguix mentioned this pull request May 30, 2022

(v14) pg_stat_database new fields #301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#301 session/active time #305

#301 session/active time #305

yhuelf commented Mar 3, 2022

rjuju commented Mar 3, 2022

yhuelf commented Mar 3, 2022

rjuju commented Mar 3, 2022

yhuelf commented Mar 3, 2022

rjuju commented Mar 3, 2022

ioguix commented Mar 3, 2022

rjuju commented Mar 3, 2022

yhuelf commented Mar 4, 2022

rjuju commented Mar 4, 2022

ioguix commented Mar 4, 2022

rjuju commented Mar 6, 2022

yhuelf commented Mar 8, 2022

ioguix commented May 3, 2022

#301 session/active time #305

#301 session/active time #305

Conversation

yhuelf commented Mar 3, 2022

rjuju commented Mar 3, 2022

yhuelf commented Mar 3, 2022

rjuju commented Mar 3, 2022

yhuelf commented Mar 3, 2022

rjuju commented Mar 3, 2022

ioguix commented Mar 3, 2022

rjuju commented Mar 3, 2022

yhuelf commented Mar 4, 2022

rjuju commented Mar 4, 2022

ioguix commented Mar 4, 2022

rjuju commented Mar 6, 2022

yhuelf commented Mar 8, 2022

ioguix commented May 3, 2022