Keypoint Detection

Running a keypoint detection model on Roboflow is very similar to segmentation or detection.

You may run it locally, hosted on our inference servers, or using a docker container.

💡 model weights In all cases, model weights need to be downloaded from Roboflow's servers first. If you have the weights locally, you may upload the weights to our servers using the [From Local Weights]( guide. For offline usage, run inference with the Python API once. The weights will be downloaded and cached in the format our inference runtime can parse.


Install dependencies:

pip install inference

Run with Python API:

from inference import get_model

image = ""

model = get_model(model_id="yolov8x-pose-640")
results = model.infer(image)[0]


Inference Setup

In all cases, you'll need the inference package.

pip install inference

By default, it runs on the CPU. Instead, you may install the GPU module with the following command:

pip install inference-gpu

API Keys

You'll need the API key to access the fine-tuned models or models on the Roboflow Universe. A guide can be found in Retrieve Your API Key.


Available Pretrained Models

You may use keypoint detection models available on the Universe. Alternatively, here's a few model_ids that we support out-of-the-box:

  • yolov8x-pose-1280 (largest)
  • yolov8x-pose-640
  • yolov8l-pose-640
  • yolov8m-pose-640
  • yolov8s-pose-640
  • yolov8n-pose-640 (smallest)


Run the model locally, without needing to set up a docker container. This pulls the model from roboflow servers and runs it on your machine. It can take both images and videos as input.


from inference import get_model

# This can be a URL, a np.ndarray or a PIL image.
image = ""

model = get_model(model_id="yolov8x-pose-640")
results = model.infer(image)[0]

Inference Pipeline allows running inference on videos, webcams and RTSP streams. You may define a custom sink to extract pose results.

More details can be found on Predict on a Video, Webcam or RTSP Stream

from inference import InferencePipeline
from import VideoFrame

def my_custom_sink(predictions: dict, video_frame: VideoFrame):

pipeline = InferencePipeline.init(
    model_id="yolov8x-pose-640", # Roboflow model to use
    video_reference=0, # Path to video, device id (int, usually 0 for built in webcams), or RTSP stream url
    on_prediction=my_custom_sink, # Function to run after each prediction

Send an image to our servers and get the detected keypoint response. Only images are supported (URL, np.ndarray, PIL).

import os
from inference_sdk import InferenceHTTPClient

# This can be a URL, a np.ndarray or a PIL image.
image = ""

client = InferenceHTTPClient(
results = client.infer(image, model_id="yolov8x-pose-640")

With this method, you may self-host a server container, similar to Hosted model API. Only images are supported (URL, np.ndarray, PIL).

Note that the model weights still need to be retrieved from our servers at least once. Check out From Local Weights for instructions on how to upload yours.

Start the inference server:

inference server start


import os
from inference_sdk import InferenceHTTPClient

# This can be a URL, a np.ndarray or a PIL image.
image = ""

client = InferenceHTTPClient(
results = client.infer(image, model_id="yolov8x-pose-640")


With supervision you may visualize the results, carry out post-processing. Supervision library standardizes results from various keypoint detection and pose estimation models into a consistent format, using adaptors such as from_inference.

Example usage:

import os
import cv2
from inference import get_model
import supervision as sv

# Model accepts URLs, np.arrays (cv2.imread), and PIL images.
# Annotators accept np.arrays (cv2.imread), and PIL images
image = ""

model = get_model(model_id="yolov8x-pose-640")
results = model.infer(image)[0]

# Any results object would work, regardless of which inference API is used
keypoints = sv.KeyPoints.from_inference(results)

# Convert to numpy image
img_name = "people-walking.jpg"
if not os.path.exists(img_name):
    os.system(f"wget -O {img_name} {image}")
image_np = cv2.imread(img_name)

annotated_image = sv.EdgeAnnotator(
).annotate(image, keypoints)