Add a column `gpu_count` to the `inference_endpoint` table. That column allows to specify how many gpus are to be assigned to each VM that runs a replica of this endpoint.
180 B
180 B