Previously, each inference replica ran as a self-contained VM, hosting both the inference gateway and inference engine. This change introduces a new option: An inference replica that runs a VM with the inference gateway but replaces the local inference engine with a secure tunnel to an external inference engine. In this setup, the external inference engine is vLLM running on a RunPod pod.