In a previous commit we added max_concurrency to inference endpoints. In this commit we set inference gateway's corresponding setting IG_MAX_REQUESTS to that value during replica setup.
In a previous commit we added max_concurrency to inference endpoints. In this commit we set inference gateway's corresponding setting IG_MAX_REQUESTS to that value during replica setup.