You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(I'm really a devops/sysadmin, so God isn't really my area of expertise, so apologies if I'm asking stupid questions here)
We've just had a situation where a missing dependency means that our Resque scheduler won't start on some machines (it throws a terminal exception on startup). However, god status always says resque-scheduler: up (and so our monitoring hasn't picked this up, and we didn't know there was a problem). It seems we attempt to restart the scheduler every 5 seconds, I won't paste the whole stack trace, but the scheduler says this on startup:
rake aborted!
cannot load such file -- tzinfo/indexes/timezones
...
Tasks: TOP => resque:scheduler
(See full trace by running task with --trace)
...which I assume means the process would have returned a non-zero exit code (if that matters).
Our god config for the scheduler looks like this:
God.watch do |w|
w.dir = RAILS_ROOT
w.name = "resque-scheduler"
w.stop_signal = 'QUIT'
w.env = {"RAILS_ENV"=>RAILS_ENV}
w.interval = 5.seconds
w.start = "rake resque:scheduler"
w.err_log = "#{RAILS_ROOT}/log/resque-scheduler_error.log"
w.log = "#{RAILS_ROOT}/log/resque-scheduler.log"
w.uid = DEFAULT_RUNAS_USER
w.gid = DEFAULT_RUNAS_GROUP
w.transition(:up, :restart) do |on|
on.condition(:memory_usage) do |c|
c.above = 350.megabytes
c.times = 2
end
end
w.transition(:init, { true => :up, false => :start }) do |on|
on.condition(:process_running) do |c|
c.running = true
end
end
# determine when process has finished starting
w.transition([:start, :restart], :up) do |on|
on.condition(:process_running) do |c|
c.running = true
c.interval = 5.seconds
end
# failsafe
on.condition(:tries) do |c|
c.times = 5
c.transition = :start
c.interval = 5.seconds
end
end
# start if process is not running
w.transition(:up, :start) do |on|
on.condition(:process_running) do |c|
c.running = false
end
end
end
What strategies can we employ to make god status say something other than "up" when this sort of thing happens? Obviously, if the process stops we want to restart it as quickly as possible, but if it's just continuously restarting, we'd like to catch that situation in some way.
Versions in use:
$ god --version
Version 0.13.7
$ ruby --version
ruby 2.1.10p492 (2016-04-01 revision 54464) [x86_64-linux]
The text was updated successfully, but these errors were encountered:
(I'm really a devops/sysadmin, so God isn't really my area of expertise, so apologies if I'm asking stupid questions here)
We've just had a situation where a missing dependency means that our Resque scheduler won't start on some machines (it throws a terminal exception on startup). However,
god status
always saysresque-scheduler: up
(and so our monitoring hasn't picked this up, and we didn't know there was a problem). It seems we attempt to restart the scheduler every 5 seconds, I won't paste the whole stack trace, but the scheduler says this on startup:...which I assume means the process would have returned a non-zero exit code (if that matters).
Our god config for the scheduler looks like this:
What strategies can we employ to make
god status
say something other than "up" when this sort of thing happens? Obviously, if the process stops we want to restart it as quickly as possible, but if it's just continuously restarting, we'd like to catch that situation in some way.Versions in use:
The text was updated successfully, but these errors were encountered: