Inference endpoints are created with three limits: * max_requests to limit the number of concurrently runnings requests * max_project_rps to define how many requests per second we allow a project to make * max_project_tps to define how many tokens per second we allow a project to consume
4.9 KiB
4.9 KiB