ubicloud/inference_endpoint_nexus.rb at increase-bootstrap-timeout

Files

Benjamin Satzger fd7f0cbffa Create inference endpoints with limits

Inference endpoints are created with three limits:

* max_requests
to limit the number of concurrently runnings requests

* max_project_rps
to define how many requests per second we allow a project to make

* max_project_tps
to define how many tokens per second we allow a project to consume

2025-02-11 17:47:12 +01:00

4.9 KiB

Raw Permalink Blame History

View Raw

4.9 KiB Raw Permalink Blame History

4.9 KiB

Raw Permalink Blame History