By default, this prevents the supervisor monitoring `restarter` from seeing that the process it is supervises exited more than once per hour by default. In the case of Heroku, this is of importance, because after a few consecutive crashes (e.g. from a recalcitrant server causing apoptosis from a `select()` loop without a timeout). The quantitatively exact Heroku backoff behavior is not documented, so some future tuning will have to take place. When Heroku elects to no longer start the process again, this pages us or degrades `respirate` throughput for a time. There's no reason we can't restart more or less right away. So, this patch does this. There are some secondary/fine considerations: 1. The restarting is randomized to a degree, to prevent too many exactly simultaneous startups. 2. It is desirable to exit sometimes, in case there is a network or computer specific dependency to the failure, to request the process supervisor (e.g. Heroku, Kubernetes) to put the entire process somewhere else...but only once in a while. 3. The log format has some regularity to it: all the `restarter` activity is in the top level key `restarter`. In it, there is always a `start_time`, which is thought to be unique (along with PID, as often conveyed by syslog) to both identify and convey useful information about the `restarter` session. In addition to `start_time` there is always a "going concern" key (`startup`, `subprocess_exit`, `shutdown`) with information relevant to that phase in the program's execution in particular. You can test the basic function of the program like this: $ RESTART_MINIMUM_TIME_ALIVE=10 bin/restarter bash -c 'sleep 1; echo hi; exit 1' {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","startup":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"minimum_time_alive":10}}} hi {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","command_exit":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"exitstatus":1,"pid":960275,"success":false}}} hi {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","command_exit":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"exitstatus":1,"pid":960379,"success":false}}} {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","shutdown":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"minimum_time_alive":10}}} [1]: We decline to use the Ruby Timeout class that injects an exception at any point in child thread execution, because it's more or less impossible for us or any library we use, including the standard library, to write a correct program that can have its stack unwound at any point that way.
4 lines
100 B
Plaintext
4 lines
100 B
Plaintext
web: bundle exec puma -C puma_config.rb
|
|
respirate: bin/restarter bin/respirate
|
|
monitor: bin/monitor
|