Add three columns to inference_endpoints: * max_requests to limit the number of concurrently runnings requests * max_project_rps to define how many requests per second we allow a project to make * max_project_tps to define how many tokens per second we allow a project to consume
333 B
333 B