GLM-OCR¶
Class: GLMOCRBlockV1
Source: inference.core.workflows.core_steps.models.foundation.glm_ocr.v1.GLMOCRBlockV1
Recognize text in images using GLM-OCR, a vision language model by Zhipu AI specialized for optical character recognition.
GLM-OCR supports three built-in recognition modes:
- Text Recognition — General-purpose text recognition for serial numbers, labels, scene text, and documents.
- Formula Recognition — Recognizes mathematical formulas and equations.
- Table Recognition — Recognizes table structures and content.
You can also select Custom Prompt to provide your own prompt for specialized recognition tasks, or Structured Output to extract values from the image into a JSON document with a user-defined schema (pair with the JSON Parser block to materialize the keys as workflow outputs).
This block pairs well with detection models and DynamicCropBlock to isolate regions of interest before running OCR. For example, use an object detection model to find labels or text regions, crop them, then pass the crops to GLM-OCR.
Note: GLM-OCR requires a GPU for inference.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/glm_ocr@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Recognition task to perform. Determines the prompt sent to GLM-OCR.. | ❌ |
prompt |
str |
Custom text prompt for GLM-OCR. Only used when task_type is 'custom'.. | ✅ |
output_structure |
Dict[str, str] |
Dictionary describing the structure of the expected JSON response. Keys are the JSON field names; values describe what the model should put in each field.. | ❌ |
max_new_tokens |
int |
Maximum number of tokens to generate. If not set, the model default will be used.. | ❌ |
model_version |
str |
The GLM-OCR model to be used for inference.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to GLM-OCR in version v1.
- inputs:
Object Detection Model,Perspective Correction,S3 Sink,Stability AI Inpainting,Image Convert Grayscale,Email Notification,Morphological Transformation,Clip Comparison,VLM As Detector,Qwen-VL,Keypoint Detection Model,QR Code Generator,Twilio SMS/MMS Notification,OpenRouter,Object Detection Model,Model Monitoring Inference Aggregator,OpenAI,Llama 3.2 Vision,MoonshotAI Kimi,Polygon Zone Visualization,Image Threshold,Stitch OCR Detections,Anthropic Claude,OpenAI-Compatible LLM,OpenAI,Dynamic Crop,Heatmap Visualization,Keypoint Visualization,Email Notification,Llama 3.2 Vision,Anthropic Claude,Stability AI Image Generation,Google Vision OCR,Camera Focus,Label Visualization,Instance Segmentation Model,Contrast Enhancement,Bounding Box Visualization,Local File Sink,Depth Estimation,Multi-Label Classification Model,Google Gemini,Keypoint Detection Model,Image Contours,EasyOCR,Relative Static Crop,Multi-Label Classification Model,Polygon Visualization,Google Gemma API,Background Color Visualization,Qwen 3.6 API,Single-Label Classification Model,Instance Segmentation Model,Qwen 3.5 API,Image Blur,Polygon Visualization,Google Gemini,SIFT Comparison,Grid Visualization,Anthropic Claude,Florence-2 Model,Triangle Visualization,Object Detection Model,OCR Model,Roboflow Custom Metadata,OpenAI,Single-Label Classification Model,Slack Notification,VLM As Classifier,Pixelate Visualization,Stitch Images,Single-Label Classification Model,OpenAI,Instance Segmentation Model,Image Slicer,LMM For Classification,Keypoint Detection Model,Image Preprocessing,SIFT,Line Counter Visualization,Roboflow Dataset Upload,Image Slicer,Semantic Segmentation Model,Corner Visualization,Stability AI Outpainting,Halo Visualization,Multi-Label Classification Model,LMM,Roboflow Dataset Upload,Qwen3.5-VL,Color Visualization,Google Gemini,Blur Visualization,Semantic Segmentation Model,Classification Label Visualization,Camera Focus,Camera Calibration,Morphological Transformation,Trace Visualization,Stitch OCR Detections,Reference Path Visualization,Halo Visualization,Ellipse Visualization,Model Comparison Visualization,Dot Visualization,Mask Visualization,GLM-OCR,Crop Visualization,Background Subtraction,Circle Visualization,CogVLM,Text Display,Absolute Static Crop,CSV Formatter,Florence-2 Model,Contrast Equalization,Roboflow Vision Events,Webhook Sink,Icon Visualization,Twilio SMS Notification,MoonshotAI Kimi,Google Gemma - outputs:
Object Detection Model,Perspective Correction,SAM 3,S3 Sink,Stability AI Inpainting,Email Notification,Keypoint Detection Model,Morphological Transformation,Path Deviation,Qwen-VL,Clip Comparison,SAM 3,Line Counter,Twilio SMS/MMS Notification,QR Code Generator,OpenRouter,VLM As Detector,YOLO-World Model,Model Monitoring Inference Aggregator,OpenAI,Llama 3.2 Vision,Line Counter,Time in Zone,MoonshotAI Kimi,Stitch OCR Detections,Polygon Zone Visualization,Image Threshold,Anthropic Claude,OpenAI-Compatible LLM,OpenAI,VLM As Detector,Dynamic Crop,Size Measurement,Heatmap Visualization,Email Notification,Keypoint Visualization,Llama 3.2 Vision,Anthropic Claude,Stability AI Image Generation,Seg Preview,Google Vision OCR,Cache Set,Label Visualization,SAM 3,Instance Segmentation Model,Path Deviation,Bounding Box Visualization,Local File Sink,Depth Estimation,Google Gemini,CLIP Embedding Model,Multi-Label Classification Model,Polygon Visualization,Google Gemma API,Background Color Visualization,Qwen 3.6 API,Instance Segmentation Model,Qwen 3.5 API,Google Gemini,Polygon Visualization,Image Blur,Moondream2,SIFT Comparison,Anthropic Claude,Florence-2 Model,Triangle Visualization,Time in Zone,Single-Label Classification Model,Roboflow Custom Metadata,OpenAI,Slack Notification,VLM As Classifier,OpenAI,Instance Segmentation Model,LMM For Classification,Image Preprocessing,Roboflow Dataset Upload,Line Counter Visualization,Detections Classes Replacement,Segment Anything 2 Model,Stability AI Outpainting,Corner Visualization,Cache Get,Halo Visualization,LMM,Roboflow Dataset Upload,Time in Zone,Semantic Segmentation Model,Color Visualization,Google Gemini,Classification Label Visualization,Perception Encoder Embedding Model,Distance Measurement,Morphological Transformation,Trace Visualization,Detections Stitch,VLM As Classifier,Stitch OCR Detections,Reference Path Visualization,Halo Visualization,Ellipse Visualization,Model Comparison Visualization,Dot Visualization,PTZ Tracking (ONVIF),Mask Visualization,Pixel Color Count,JSON Parser,GLM-OCR,Crop Visualization,CogVLM,Circle Visualization,Text Display,Florence-2 Model,Contrast Equalization,Roboflow Vision Events,Webhook Sink,Icon Visualization,Twilio SMS Notification,MoonshotAI Kimi,Google Gemma
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
GLM-OCR in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Custom text prompt for GLM-OCR. Only used when task_type is 'custom'..model_version(roboflow_model_id): The GLM-OCR model to be used for inference..
-
output
parsed_output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.
Example JSON definition of step GLM-OCR in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/glm_ocr@v1",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "Describe the text in the image.",
"output_structure": {
"my_key": "description"
},
"max_new_tokens": "<block_does_not_provide_example>",
"model_version": "glm-ocr"
}