ubicloud/prog
Eren Başak 0b8d249c23 Add Disk IO to VM Host Healthchecks
The health checks in `VmHost#check_pulse` and `HostNexus#available?`
were simple SSH access attempts, but this was not sufficient for
detecting issues in the disks. This change adds a simple disk IO
to the health check commands.

The disk I/O added for the health check is a simple read from
/dev/zero into a random file in /tmp. This would be enough to detect
any obvious failures related to the disks. Note that, in cases of
multiple disks are mounted into the file system (`/`) via raid, this
approach may not detect a single disk failure as long as the system
works.

Note: Through the codebase, health checks for the monitor (`check_pulse`)
and progs (`available?`) are duplicated, which is also the case for
VM host. We can argue about unifying the health check and availability
detection in general as a next step. Still, I wanted to make some
deduplication by extracting the shell command to be used for health
check into a `VmHost` constant.

Future work: While some disk i/o would be enough to check for obvious problems,
as a future work we can consider monitoring SMART bits, preferably
with less frequency and less severity.
2024-10-29 11:05:28 +03:00
..
ai Create billing records for inference endpoints 2024-10-16 13:56:06 +02:00
dns_zone Fix double delete in DNS purge 2024-01-16 12:42:11 +03:00
github Fix the condition check during installation destroy 2024-10-07 15:03:13 +03:00
minio MinioClusterNexus.destroy cleans-up firewalls too 2024-10-02 12:11:30 +02:00
postgres Do not allow provisioning Postgres 17 for Lantern instances 2024-10-25 13:44:27 +03:00
storage Track hugepages used for SPDK installations in the database. 2024-04-24 18:12:46 -07:00
test Fix E2E tests when they select Debian 2024-10-28 21:46:37 -07:00
vm Add Disk IO to VM Host Healthchecks 2024-10-29 11:05:28 +03:00
vnet Add weighted subnet picking and ban some ranges 2024-10-25 14:38:41 +03:00
base.rb Reapply "Increase nap time from 0 to 1 in donate" 2024-04-22 14:19:43 +02:00
bootstrap_rhizome.rb Do not create keypair if there is already one in BootstrapRhizome 2023-09-06 07:12:31 +03:00
check_usage_alerts.rb Add usage alerting based on the total consumption 2024-04-22 16:00:57 +02:00
download_boot_image.rb Remove almalinux-8 2024-10-28 11:19:10 -07:00
download_cloud_hypervisor.rb Add Prog to download Cloud Hypervisor on-demand 2024-06-13 17:11:42 +02:00
download_firmware.rb Add Prog to download firmware from control plane 2024-05-24 15:32:00 +02:00
expire_project_invitations.rb Expire project invitation after 7 days 2024-09-10 19:21:45 +03:00
heartbeat.rb Add missing service logging to heartbeat 2024-03-29 11:15:53 -07:00
install_dnsmasq.rb Update dnsmasq to v2.89 2023-12-15 13:56:09 -08:00
install_rhizome.rb Core classes freezing, try three 2024-02-21 09:23:15 -08:00
learn_arch.rb Learn VmHost architecture 2023-11-07 12:02:50 -08:00
learn_cores.rb Remove total_nodes learning 2023-11-29 11:42:42 -08:00
learn_memory.rb Add an identifier to all Prog label methods 2023-08-23 12:09:57 +03:00
learn_network.rb Increase the IPv6 prefix min length to 112 bits 2024-09-27 11:21:33 +02:00
learn_pci.rb Learn about PCI devices of a host 2024-05-07 16:10:05 +02:00
learn_storage.rb Replace VmHost.available_storage_gib and total_storage_gib columns with funcs 2024-04-22 13:29:34 +02:00
page_nexus.rb Add extra data and severity to pages 2024-07-12 14:36:29 +03:00
redeliver_github_failures.rb Reduce the frequency of failed redeliveries to every 5 minutes 2023-12-22 10:12:52 +03:00
remove_boot_image.rb Program to remove a boot image. 2024-05-16 09:32:05 -07:00
resolve_globally_blocked_dnsnames.rb Add prog to update ip addresses of globally blocked dns names 2024-03-13 10:50:40 +01:00
rotate_ssh_key.rb Add an identifier to all Prog label methods 2023-08-23 12:09:57 +03:00
rotate_storage_kek.rb Adjust storage key tool for multiple devices 2024-02-27 12:00:51 -08:00
setup_hugepages.rb Change the hugepages arithmetic. 2024-02-27 11:34:14 -08:00
setup_nftables.rb Start using nftables at host to block unused ip addresses 2024-02-21 11:40:20 +01:00
setup_sysstat.rb Install Sysstat. 2023-11-10 12:54:43 -08:00
test.rb Assemble Pages with correct Prog 2024-07-08 09:46:50 +02:00