Skip to content

Glm ocr

GLM-OCR is a vision-language model by Zhipu AI (ZAI) for Optical Character Recognition (OCR).

GLM-OCR uses a modern image-text-to-text architecture for high-quality text recognition. It supports custom prompts to guide recognition for different use cases such as serial numbers, labels, and document text.

To use GLM-OCR with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, sign up for a free Roboflow account.

Then, retrieve your API key from the Roboflow dashboard. Learn how to retrieve your API key.

Run the following command to set your API key in your coding environment:

export ROBOFLOW_API_KEY=<your api key>

Let's try running GLM-OCR on this image:

Serial number

Note

GLM-OCR requires inference with inference-models support (USE_INFERENCE_MODELS=true) and a GPU.

To run the example, start inference server locally:

pip install inference-cli && inference server start

Using the Inference SDK

import os
from inference_sdk import InferenceHTTPClient

CLIENT = InferenceHTTPClient(
    api_url="http://127.0.0.1:9001",
    api_key=os.environ["ROBOFLOW_API_KEY"]
)

result = CLIENT.infer_lmm(
    inference_input="./serial_number.png",
    prompt="Text Recognition:",
    model_id="glm-ocr",
)
print(result["response"])

See Also