This introduces the concept of inference endpoints and inference endpoint replicas. An inference endpoint describes a model and is backed by a private network, a load balancer and multiple replicas. A replica is basically a VM that runs the model. The replica VM is part of the inference endpoints private subnet and attached to its load balancer.
5.0 KiB
5.0 KiB