Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiver service bails when archiver is off #358

Open
frost242 opened this issue Sep 28, 2023 · 6 comments
Open

Archiver service bails when archiver is off #358

frost242 opened this issue Sep 28, 2023 · 6 comments
Assignees
Labels

Comments

@frost242
Copy link
Member

Hello,

I didn't spot this until now. The archiver service bails out when an instance once had WAL archiving active, then being turned off.

The following error is displayed :
ERROR: could not stat file "pg_wal/0000000100000001000000DE": No such file or directory

pg_stat_archiver shows :

[postgres]# select * from pg_stat_archiver ;
-[ RECORD 1 ]------+------------------------------
archived_count     | 631
last_archived_wal  | 0000000100000001000000DD
last_archived_time | 2023-06-29 02:37:03.499946+02
failed_count       | 0
last_failed_wal    | [null]
last_failed_time   | [null]
stats_reset        | 2023-01-25 15:06:59.622144+01

The db was loaded a few hours ago and now current LSN is :

[postgres]# SELECT pg_current_wal_lsn();
-[ RECORD 1 ]------+------------
pg_current_wal_lsn | 11/BB000000

So, we should probably check if archiver is on before checking files.

We'll try to address this issue ASAP.

@frost242 frost242 self-assigned this Sep 28, 2023
@frost242 frost242 added the bug label Sep 28, 2023
@rjuju
Copy link
Member

rjuju commented Sep 28, 2023

I haven't looked at the code, but couldn't you get the same problem if the archiver were still on, failing for some time and then someone had the great idea to remove oldest (and unarchived) WALs to avoid disk saturation?

@frost242
Copy link
Member Author

Of course, it could happen but then the archiver should still be on. And thus, it must bail.

I don't want to skip this check if archiver is on, but skip this if archiver is off. Do you think it's a problem ?

@frost242
Copy link
Member Author

The check fails on the call to pg_stat_file. The issue can be addressed in the query this way :

diff --git a/check_pgactivity b/check_pgactivity
index 6a902f7..f9249b5 100755
--- a/check_pgactivity
+++ b/check_pgactivity
@@ -2063,12 +2063,15 @@ sub check_archiver {
                 ELSE 0
                 END, last_archived_wal, last_failed_wal,
                 /* mod time of the next wal to archive */
+               CASE WHEN current_setting('archive_mode')::bool IS TRUE THEN
                 extract('epoch' from (current_timestamp -
                     (pg_stat_file('pg_wal/'||pg_walfile_name(
                         (to_hex((last_archived_off+1)/4294967296)
                         ||'/'||to_hex((last_archived_off+1)%4294967296))::pg_lsn
                     ))).modification )
-                ) AS oldest
+                )
+               ELSE NULL
+               END AS oldest
         FROM (
             SELECT last_archived_wal, last_archived_time, last_failed_wal,
                 walsegsize,

The parameter value of archiver_mode must also be retrieved, to adapt the output message for this particular case.

@rjuju
Copy link
Member

rjuju commented Sep 28, 2023

oh sorry, I misunderstood your first message and thought that the check itself would error out rather than returning a regular ERROR status.

I'm fine with your approach.

@frost242
Copy link
Member Author

No worry, that makes things clear for everyone. I'll propose a fix soon.

@frost242
Copy link
Member Author

frost242 commented Jul 1, 2024

Hello,
I totally forgot to provide the fix. Will do ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants