ubicloud/rhizome
Enes Cakir 5e34c5a809 Use htcat to download image from presigned URL
We previously used curl to download images from presigned URLs. This
worked well for our internal MinIO cluster, but with R2 presigned URLs,
curl performs poorly since it downloads with a single connection.

First, I benchmarked aria2 [^1], which supports multiple connections.
However, Daniel recommended using a memory-safe tool instead and
suggested htcat [^2], which his team wrote about 10 years ago. After a
few small improvements, htcat now works well for our use case and
delivers performance comparable to aria2.

If the URL is from R2, we now use htcat to download it with multiple
connections.

Since htcat pipes to stdout, we calculate the sha256 hash while
downloading similar to how we did with curl.

I will download the htcat binary on existing hosts.

[^1]: https://aria2.github.io/
[^2]: https://github.com/htcat/htcat
2025-08-27 12:43:23 +03:00
..
common Move metrics-collector to rhizome/common/bin 2025-05-23 12:42:17 +05:30
host Use htcat to download image from presigned URL 2025-08-27 12:43:23 +03:00
inference_endpoint Fix rhizome code not compatible with Ruby 3.0 2025-05-06 23:18:09 +09:00
kubernetes Install Metrics Server on new Kubernetes Clusters 2025-07-14 14:13:41 +03:00
minio/bin Enable minio.service so that we auto start after a possible reboot 2024-02-13 15:03:56 +01:00
postgres Don't close ubi_replication's connections 2025-07-07 19:11:45 +03:00
victoria_metrics/bin Add -dedup.minScrapeInterval to VictoriaMetrics command line 2025-05-08 16:47:01 +05:30
.rubocop.yml Change rhizome TargetRubyVersion to 3.0 2025-05-06 23:18:09 +09:00
Gemfile Make BUNDLE_GEMFILE=rhizome/Gemfile bundle install work from repository root 2025-05-06 23:18:09 +09:00
Gemfile.lock Add base64 to rhizome/host Gemfile 2025-04-29 16:45:24 -07:00