ubicloud

Files

Benjamin Satzger 1ae2804639 External Inference Engine via RunPod

Previously, each inference replica ran as a self-contained VM, hosting
both the inference gateway and inference engine.

This change introduces a new option:
An inference replica that runs a VM with the inference gateway but
replaces the local inference engine with a secure tunnel to an external
inference engine. In this setup, the external inference engine is vLLM
running on a RunPod pod.

2025-03-17 12:56:32 +01:00

inference_endpoint_nexus_spec.rb

Ensure models in ai_models.yml have unique id

2025-02-11 19:13:36 +01:00

inference_endpoint_replica_nexus_spec.rb

External Inference Engine via RunPod

2025-03-17 12:56:32 +01:00