Files
ubicloud/Procfile
Daniel Farina e5f9d83002 Add a restarter program and use it for respirate
By default, this prevents the supervisor monitoring `restarter` from
seeing that the process it is supervises exited more than once per
hour by default.

In the case of Heroku, this is of importance, because after a few
consecutive crashes (e.g. from a recalcitrant server causing apoptosis
from a `select()` loop without a timeout).  The quantitatively exact
Heroku backoff behavior is not documented, so some future tuning will
have to take place.

When Heroku elects to no longer start the process again, this pages us
or degrades `respirate` throughput for a time.

There's no reason we can't restart more or less right away.
So, this patch does this.  There are some secondary/fine
considerations:

1. The restarting is randomized to a degree, to prevent too many
   exactly simultaneous startups.

2. It is desirable to exit sometimes, in case there is a network or
   computer specific dependency to the failure, to request the process
   supervisor (e.g. Heroku, Kubernetes) to put the entire process
   somewhere else...but only once in a while.

3. The log format has some regularity to it: all the `restarter`
   activity is in the top level key `restarter`.  In it, there is
   always a `start_time`, which is thought to be unique (along with
   PID, as often conveyed by syslog) to both identify and convey
   useful information about the `restarter` session.  In addition to
   `start_time` there is always a "going concern" key (`startup`,
   `subprocess_exit`, `shutdown`) with information relevant to that
   phase in the program's execution in particular.

You can test the basic function of the program like this:

    $ RESTART_MINIMUM_TIME_ALIVE=10 bin/restarter bash -c 'sleep 1; echo hi; exit 1'
    {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","startup":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"minimum_time_alive":10}}}
    hi
    {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","command_exit":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"exitstatus":1,"pid":960275,"success":false}}}
    hi
    {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","command_exit":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"exitstatus":1,"pid":960379,"success":false}}}
    {"restarter":{"start_time":"2025-03-21T19:08:28+00:00","shutdown":{"command":["bash","-c","sleep 1; echo hi; exit 1"],"minimum_time_alive":10}}}

[1]: We decline to use the Ruby Timeout class that injects an
exception at any point in child thread execution, because it's more or
less impossible for us or any library we use, including the standard
library, to write a correct program that can have its stack unwound at
any point that way.
2025-03-21 12:19:33 -07:00

4 lines
100 B
Plaintext

web: bundle exec puma -C puma_config.rb
respirate: bin/restarter bin/respirate
monitor: bin/monitor