In a previous commit we added max_concurrency to inference endpoints. In this commit we set inference gateway's corresponding setting IG_MAX_REQUESTS to that value during replica setup.
879 B
Executable File
879 B
Executable File