In monitor, we create 2 threads per resource; one for SSH event loop processing
and one for actual pulse check. In previous version, each resource would keep
their threads even after the pulse check is completed. This means the number of
resources we can monitor at the same time is limited by the number of threads
we can create.
This commit changes the behavior so that after the pulse check is completed,
the threads are released. This way, we can monitor significantly more resources
at the same time.
One drawback of the new approach is that we need to re-create the threads for
each check. In my system creating 1000 threads takes about 0.025 seconds, so
overhead is seems negligible.
I also added a new helper method, needs_event_loop_for_pulse_check? to models.
We actually don't need event loop for pulse check for most of the resources,
only PostgresServer and MinioServer need it. Other resources rely on exec! to
perform their pulse check which doesn't need event loop. In fact, I observed
that extra event loop processing actually slows down the exec! calls. By taking
this into consideration, we reduce the number of threads we create and also
improve the speed of some pulse checks.
Another change we are making is that removing the monitoring_interval from the
model and hardcoding it in the monitor as 5 seconds. This removes capability of
setting different monitoring intervals for different resources. Supporting this
would require some work and since it is not used in the current implementation
I decided to remove all together. If we need to support this in the future, we
can add it back with some effort.