Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

god status says process is "up" when it's continuously restarting #252

Open
coofercat opened this issue Nov 20, 2017 · 1 comment
Open

Comments

@coofercat
Copy link

(I'm really a devops/sysadmin, so God isn't really my area of expertise, so apologies if I'm asking stupid questions here)

We've just had a situation where a missing dependency means that our Resque scheduler won't start on some machines (it throws a terminal exception on startup). However, god status always says resque-scheduler: up (and so our monitoring hasn't picked this up, and we didn't know there was a problem). It seems we attempt to restart the scheduler every 5 seconds, I won't paste the whole stack trace, but the scheduler says this on startup:

rake aborted!
cannot load such file -- tzinfo/indexes/timezones
...
Tasks: TOP => resque:scheduler
(See full trace by running task with --trace)

...which I assume means the process would have returned a non-zero exit code (if that matters).

Our god config for the scheduler looks like this:

 God.watch do |w|
  w.dir          = RAILS_ROOT
  w.name         = "resque-scheduler"
  w.stop_signal  = 'QUIT'
  w.env          = {"RAILS_ENV"=>RAILS_ENV}
  w.interval     = 5.seconds
  w.start        = "rake resque:scheduler"
  w.err_log      = "#{RAILS_ROOT}/log/resque-scheduler_error.log"
  w.log          = "#{RAILS_ROOT}/log/resque-scheduler.log"
  w.uid          = DEFAULT_RUNAS_USER
  w.gid          = DEFAULT_RUNAS_GROUP

  w.transition(:up, :restart) do |on|
    on.condition(:memory_usage) do |c|
      c.above = 350.megabytes
      c.times = 2
    end
  end

  w.transition(:init, { true => :up, false => :start }) do |on|
    on.condition(:process_running) do |c|
      c.running = true
    end
  end

  # determine when process has finished starting
  w.transition([:start, :restart], :up) do |on|
    on.condition(:process_running) do |c|
      c.running = true
      c.interval = 5.seconds
    end

    # failsafe
    on.condition(:tries) do |c|
      c.times = 5
      c.transition = :start
      c.interval = 5.seconds
    end
  end

  # start if process is not running
  w.transition(:up, :start) do |on|
    on.condition(:process_running) do |c|
      c.running = false
    end
  end
end

What strategies can we employ to make god status say something other than "up" when this sort of thing happens? Obviously, if the process stops we want to restart it as quickly as possible, but if it's just continuously restarting, we'd like to catch that situation in some way.

Versions in use:

$ god --version
Version 0.13.7
$ ruby --version
ruby 2.1.10p492 (2016-04-01 revision 54464) [x86_64-linux]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@coofercat and others