Deploying inference
to Cloud¶
You can deploy Roboflow Inference containers to virtual machines in the cloud. These VMs are configured to run CPU or GPU-based Inference servers under the hood, so you don't have to deal with OS/GPU drivers/docker installations, etc! The Inference cli currently supports deploying the Roboflow Inference container images into a virtual machine running on Google (GCP) or Amazon cloud (AWS).
The Roboflow Inference CLI assumes the corresponding cloud CLI is configured for the project you want to deploy the virtual machine into. Read instructions for setting up Google/GCP - gcloud cli or the Amazon/AWS aws cli.
Roboflow Inference cloud deploy is powered by the popular Skypilot project.
Make sure cloud-deploy
extras is installed
To run commands presented below, you need to have cloud-deploy
extras installed:
pip install "inference-cli[cloud-deploy]"
Discovering command capabilities
To check detail of the command, run:
inference cloud --help
Additionally, help guide is also available for each sub-command:
inference cloud deploy --help
inference cloud deploy
¶
We illustrate Inference cloud deploy with some examples, below.
Deploy GPU or CPU inference to AWS or GCP
# Deploy the roboflow Inference GPU container into a GPU-enabled VM in AWS
inference cloud deploy --provider aws --compute-type gpu
# Deploy the roboflow Inference CPU container into a CPU-only VM in GCP
inference cloud deploy --provider gcp --compute-type cpu
Note the "cluster name" printed after the deployment completes. This handle is used in many subsequent commands. The deploy command also prints helpful debug and cost information about your VM.
Deploying Inference into a cloud VM will also print out an endpoint of the form "http://1.2.3.4:9001"; you can now run inferences against this endpoint.
Note that the port 9001 is automatically opened - check with your security admin if this is acceptable for your cloud/project.
inference cloud status
¶
To check the status of your deployment, run:
inference cloud status
Stop and start deployments¶
You can start and stop your deployment using:
inference cloud start <deployment_handle>
and
# Stop the VM, you only pay for disk storage while the VM is stopped
inference cloud stop <deployment_handle>
inference cloud undeploy
¶
To delete (undeploy) your deployment, run:
inference cloud undeploy <deployment_handle>
SSH into the cloud deployment¶
You can SSH into your cloud deployment with the following command:
ssh <deployment_handle>
The required SSH key is automatically added to your ~/.ssh/config
, you don't need to configure this manually.
Cloud Deploy Customization¶
Roboflow Inference cloud deploy will create VMs based on internally tested templates.
For advanced usecases and to customize the template, you can use your sky yaml template on the command-line, like so:
inference cloud deploy --custom /path/to/sky-template.yaml
If you want you can download the standard template stored in the roboflow cli and the modify it for your needs, this command will do that.
# This command will print out the standard gcp/cpu sky template.
inference cloud deploy --dry-run --provider gcp --compute-type cpu
Then you can deploy a custom template based off your changes.
As an aside, you can also use the sky cli to control your deployment(s) and access some more advanced functionality.
Roboflow Inference deploy currently supports AWS and GCP, please open an issue on the Inference GitHub repository if you would like to see other cloud providers supported.