GLM-OCR¶
Class: GLMOCRBlockV1
Source: inference.core.workflows.core_steps.models.foundation.glm_ocr.v1.GLMOCRBlockV1
Recognize text in images using GLM-OCR, a vision language model by Zhipu AI specialized for optical character recognition.
GLM-OCR supports three built-in recognition modes:
- Text Recognition — General-purpose text recognition for serial numbers, labels, scene text, and documents.
- Formula Recognition — Recognizes mathematical formulas and equations.
- Table Recognition — Recognizes table structures and content.
You can also select Custom Prompt to provide your own prompt for specialized recognition tasks.
This block pairs well with detection models and DynamicCropBlock to isolate regions of interest before running OCR. For example, use an object detection model to find labels or text regions, crop them, then pass the crops to GLM-OCR.
Note: GLM-OCR requires a GPU for inference.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/glm_ocr@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Recognition task to perform. Determines the prompt sent to GLM-OCR.. | ❌ |
prompt |
str |
Custom text prompt for GLM-OCR. Only used when task_type is 'custom'.. | ✅ |
model_version |
str |
The GLM-OCR model to be used for inference.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to GLM-OCR in version v1.
- inputs:
Stitch Images,Image Threshold,Email Notification,Corner Visualization,Image Blur,Ellipse Visualization,OpenAI,Roboflow Dataset Upload,Object Detection Model,Depth Estimation,Stitch OCR Detections,EasyOCR,Absolute Static Crop,Multi-Label Classification Model,CogVLM,Google Gemini,Stability AI Image Generation,Grid Visualization,Dynamic Crop,Image Slicer,Image Preprocessing,Relative Static Crop,SIFT,Morphological Transformation,Instance Segmentation Model,Line Counter Visualization,Trace Visualization,LMM For Classification,Halo Visualization,Dot Visualization,GLM-OCR,Model Monitoring Inference Aggregator,Roboflow Custom Metadata,Keypoint Detection Model,Pixelate Visualization,Circle Visualization,Image Convert Grayscale,Icon Visualization,QR Code Generator,S3 Sink,Semantic Segmentation Model,Keypoint Detection Model,Twilio SMS Notification,Halo Visualization,Camera Focus,Anthropic Claude,OCR Model,Polygon Visualization,Text Display,Reference Path Visualization,Instance Segmentation Model,Llama 3.2 Vision,CSV Formatter,Crop Visualization,Roboflow Dataset Upload,Mask Visualization,Heatmap Visualization,Webhook Sink,Label Visualization,Classification Label Visualization,Google Vision OCR,Florence-2 Model,Florence-2 Model,VLM As Detector,Polygon Zone Visualization,Stability AI Inpainting,Google Gemini,Perspective Correction,Camera Calibration,Anthropic Claude,OpenAI,OpenAI,Qwen3.5-VL,Background Color Visualization,Anthropic Claude,Email Notification,Background Subtraction,Contrast Equalization,SIFT Comparison,Multi-Label Classification Model,Keypoint Visualization,Stitch OCR Detections,LMM,Single-Label Classification Model,Color Visualization,Single-Label Classification Model,OpenAI,Object Detection Model,Roboflow Vision Events,Local File Sink,VLM As Classifier,Twilio SMS/MMS Notification,Triangle Visualization,Clip Comparison,Blur Visualization,Bounding Box Visualization,Camera Focus,Polygon Visualization,Google Gemini,Image Slicer,Image Contours,Model Comparison Visualization,Stability AI Outpainting,Slack Notification - outputs:
Image Threshold,Email Notification,Corner Visualization,Image Blur,Ellipse Visualization,OpenAI,Roboflow Dataset Upload,Time in Zone,Stitch OCR Detections,Depth Estimation,CogVLM,Google Gemini,Stability AI Image Generation,Time in Zone,Dynamic Crop,Instance Segmentation Model,Image Preprocessing,Morphological Transformation,Line Counter Visualization,Trace Visualization,LMM For Classification,Halo Visualization,Dot Visualization,GLM-OCR,Polygon Visualization,Model Monitoring Inference Aggregator,Cache Get,Roboflow Custom Metadata,Pixel Color Count,Circle Visualization,Icon Visualization,QR Code Generator,S3 Sink,Detections Classes Replacement,Twilio SMS Notification,Halo Visualization,Anthropic Claude,SAM 3,Polygon Visualization,Text Display,Reference Path Visualization,Instance Segmentation Model,Llama 3.2 Vision,Roboflow Dataset Upload,Crop Visualization,Mask Visualization,CLIP Embedding Model,Heatmap Visualization,Webhook Sink,Cache Set,Google Vision OCR,Label Visualization,Classification Label Visualization,Florence-2 Model,Segment Anything 2 Model,Florence-2 Model,Polygon Zone Visualization,Stability AI Inpainting,Google Gemini,SAM 3,Perspective Correction,Anthropic Claude,OpenAI,OpenAI,PTZ Tracking (ONVIF),Background Color Visualization,Anthropic Claude,Size Measurement,Email Notification,SIFT Comparison,Contrast Equalization,Keypoint Visualization,Time in Zone,Line Counter,Path Deviation,Stitch OCR Detections,LMM,Perception Encoder Embedding Model,Line Counter,SAM 3,Seg Preview,Color Visualization,OpenAI,Roboflow Vision Events,Local File Sink,Detections Stitch,Twilio SMS/MMS Notification,YOLO-World Model,Triangle Visualization,Clip Comparison,Bounding Box Visualization,Distance Measurement,Google Gemini,Path Deviation,Moondream2,Model Comparison Visualization,Stability AI Outpainting,Slack Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
GLM-OCR in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Custom text prompt for GLM-OCR. Only used when task_type is 'custom'..model_version(roboflow_model_id): The GLM-OCR model to be used for inference..
-
output
parsed_output(string): String value.
Example JSON definition of step GLM-OCR in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/glm_ocr@v1",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "Describe the text in the image.",
"model_version": "glm-ocr"
}