Unfortunately, GitHub doesn't have an API endpoint to get all workflow
jobs for the repository.
We just get all queued workflow runs for the repository, then get
workflow jobs for each workflow run.
We have a 2-minute limit in respirate for each run. If it exceeds this
limit, respirate considers the run stuck and terminates itself.
We encountered this issue in production when we needed to poll over 200
workflow runs in one iteration, which took more than 2 minutes. As a
result, respirate crashed multiple times.
The tricky part is that, since runners are job/run agnostic, we sum up
all queued labels and compare them with the existing runners for this
repository. If there are fewer runners, we provision extra ones. Since
we limit polling to the first 200 runs per iteration, the existing
runner count will likely be higher, and we won't provision extra ones.
However, this is a rare case, and we poll jobs as a nice-to-have when
the webhook is missing every 5 minutes, which is acceptable.
The number of queued runs goes down when their jobs are assigned to
runners, so it shouldn't always be high.