GLM-OCR¶
Class: GLMOCRBlockV1
Source: inference.core.workflows.core_steps.models.foundation.glm_ocr.v1.GLMOCRBlockV1
Recognize text in images using GLM-OCR, a vision language model by Zhipu AI specialized for optical character recognition.
GLM-OCR supports three built-in recognition modes:
- Text Recognition — General-purpose text recognition for serial numbers, labels, scene text, and documents.
- Formula Recognition — Recognizes mathematical formulas and equations.
- Table Recognition — Recognizes table structures and content.
You can also select Custom Prompt to provide your own prompt for specialized recognition tasks, or Structured Output to extract values from the image into a JSON document with a user-defined schema (pair with the JSON Parser block to materialize the keys as workflow outputs).
This block pairs well with detection models and DynamicCropBlock to isolate regions of interest before running OCR. For example, use an object detection model to find labels or text regions, crop them, then pass the crops to GLM-OCR.
Note: GLM-OCR requires a GPU for inference.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/glm_ocr@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Recognition task to perform. Determines the prompt sent to GLM-OCR.. | ❌ |
prompt |
str |
Custom text prompt for GLM-OCR. Only used when task_type is 'custom'.. | ✅ |
output_structure |
Dict[str, str] |
Dictionary describing the structure of the expected JSON response. Keys are the JSON field names; values describe what the model should put in each field.. | ❌ |
max_new_tokens |
int |
Maximum number of tokens to generate. If not set, the model default will be used.. | ❌ |
model_version |
str |
The GLM-OCR model to be used for inference.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to GLM-OCR in version v1.
- inputs:
Roboflow Dataset Upload,Line Counter Visualization,Stability AI Outpainting,Object Detection Model,Email Notification,Google Gemma API,Image Slicer,OCR Model,Google Vision OCR,Image Preprocessing,Google Gemini,Instance Segmentation Model,EasyOCR,Color Visualization,Object Detection Model,Multi-Label Classification Model,OpenAI,Ellipse Visualization,Polygon Visualization,Anthropic Claude,Single-Label Classification Model,Relative Static Crop,Webhook Sink,Model Comparison Visualization,Trace Visualization,Stitch OCR Detections,Camera Focus,Roboflow Custom Metadata,Qwen 3.5 API,Object Detection Model,OpenAI,Instance Segmentation Model,Single-Label Classification Model,Semantic Segmentation Model,VLM As Classifier,Image Threshold,Stitch Images,Heatmap Visualization,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,Florence-2 Model,Halo Visualization,Instance Segmentation Model,CogVLM,Crop Visualization,Camera Calibration,Florence-2 Model,Multi-Label Classification Model,GLM-OCR,Dot Visualization,S3 Sink,Semantic Segmentation Model,Twilio SMS Notification,Icon Visualization,Model Monitoring Inference Aggregator,Local File Sink,Google Gemini,Roboflow Dataset Upload,Image Contours,Pixelate Visualization,Keypoint Detection Model,Twilio SMS/MMS Notification,Polygon Zone Visualization,Reference Path Visualization,Blur Visualization,Anthropic Claude,Background Subtraction,Text Display,Clip Comparison,CSV Formatter,VLM As Detector,LMM,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Bounding Box Visualization,Depth Estimation,Classification Label Visualization,Image Slicer,Absolute Static Crop,Image Blur,Stability AI Inpainting,Multi-Label Classification Model,Polygon Visualization,Image Convert Grayscale,SIFT,Single-Label Classification Model,Roboflow Vision Events,OpenAI,Google Gemini,Label Visualization,Corner Visualization,Grid Visualization,Dynamic Crop,Contrast Equalization,Keypoint Visualization,Triangle Visualization,Qwen3.5-VL,Keypoint Detection Model,QR Code Generator,Halo Visualization,Circle Visualization,Camera Focus,Mask Visualization,LMM For Classification,Morphological Transformation,OpenAI,Contrast Enhancement,Keypoint Detection Model,MoonshotAI Kimi,Llama 3.2 Vision,Background Color Visualization,Email Notification,Slack Notification,Stitch OCR Detections - outputs:
CLIP Embedding Model,Detections Stitch,Roboflow Dataset Upload,Line Counter Visualization,Email Notification,Google Gemma API,Stability AI Outpainting,Google Vision OCR,Distance Measurement,Instance Segmentation Model,Google Gemini,Image Preprocessing,Color Visualization,Object Detection Model,Multi-Label Classification Model,OpenAI,Ellipse Visualization,Polygon Visualization,Anthropic Claude,Single-Label Classification Model,Time in Zone,Detections Classes Replacement,Cache Set,Webhook Sink,Model Comparison Visualization,Trace Visualization,Stitch OCR Detections,Qwen 3.5 API,Roboflow Custom Metadata,YOLO-World Model,OpenAI,SAM 3,Perception Encoder Embedding Model,Instance Segmentation Model,VLM As Classifier,Size Measurement,Image Threshold,Heatmap Visualization,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,Florence-2 Model,Halo Visualization,Instance Segmentation Model,CogVLM,Crop Visualization,Florence-2 Model,Path Deviation,Time in Zone,GLM-OCR,Dot Visualization,S3 Sink,Path Deviation,SAM 3,Semantic Segmentation Model,Twilio SMS Notification,Seg Preview,Model Monitoring Inference Aggregator,Local File Sink,Google Gemini,Roboflow Dataset Upload,Icon Visualization,VLM As Classifier,JSON Parser,Line Counter,Twilio SMS/MMS Notification,Time in Zone,Polygon Zone Visualization,Reference Path Visualization,Anthropic Claude,Text Display,Clip Comparison,VLM As Detector,LMM,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Line Counter,Bounding Box Visualization,Pixel Color Count,Depth Estimation,Classification Label Visualization,Image Blur,Stability AI Inpainting,Polygon Visualization,SAM 3,Roboflow Vision Events,OpenAI,Google Gemini,Label Visualization,VLM As Detector,Corner Visualization,Keypoint Detection Model,Contrast Equalization,Dynamic Crop,Triangle Visualization,Keypoint Visualization,Moondream2,QR Code Generator,Halo Visualization,Circle Visualization,Segment Anything 2 Model,Mask Visualization,LMM For Classification,Morphological Transformation,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Background Color Visualization,Email Notification,PTZ Tracking (ONVIF),Slack Notification,Stitch OCR Detections,Cache Get
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
GLM-OCR in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Custom text prompt for GLM-OCR. Only used when task_type is 'custom'..model_version(roboflow_model_id): The GLM-OCR model to be used for inference..
-
output
parsed_output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.
Example JSON definition of step GLM-OCR in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/glm_ocr@v1",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "Describe the text in the image.",
"output_structure": {
"my_key": "description"
},
"max_new_tokens": "<block_does_not_provide_example>",
"model_version": "glm-ocr"
}