CogVLM (Multimodal Language Model)
CogVLM is a Large Multimodal Model (LMM). CogVLM is available for use in Inference.
You can ask CogVLM questions about the contents of an image and retrieve a text response.
You can run CogVLM through Roboflow Inference with three degrees of quantization. Quantization allows you to make a model smaller, but there is an accuracy trade-off. The three degrees of quantization are:
- No quantization: Run the full model. For this, you will need 80 GB of RAM. You could run the model on an 80 GB NVIDIA A100.
- 8-bit quantization: Run the model with less accuracy than no quantization. You will need 32 GB of RAM.You could run this model on an A100 with sufficient virtual RAM.
- 4-bit quantization: Run the model with less accuracy than 8-bit quantization. You will need 16 GB of RAM. You could run this model on an NVIDIA T4.
Use CogVLM with Inference¶
To use CogVLM with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, sign up for a free Roboflow account.
Then, retrieve your API key from the Roboflow dashboard. Learn how to retrieve your API key.
Run the following command to set your API key in your coding environment:
export ROBOFLOW_API_KEY=<your api key>
We recommend using CogVLM paired with inference HTTP API adjusted to run in GPU environment. It's easy to set up
inference-cli tool. Run the following command to set up environment and run the API under
pip install inference inference-cli inference-sdk
inference server start # make sure that you are running this at machine with GPU! Otherwise CogVLM will not be available
Let's ask a question about the following image:
inference-sdk to prompt the model:
from inference_sdk import InferenceHTTPClient
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001", # only local hosting supported
result = CLIENT.prompt_cogvlm(
text_prompt="Is there a forklift close to a conveyor belt?",
forklift.jpeg with the path to the image in which you want to detect objects.
Let's use the prompt "Is there a forklift close to a conveyor belt?”"
The results of CogVLM will appear in your terminal:
'response': 'yes, there is a forklift close to a conveyor belt, and it appears to be transporting a stack of items onto it.',
CogVLM successfully answered our question, noting there is a forklift close to the conveyor belt in the image.