SmolVLM2
SmolVLM2 is a multimodal model developed by Hugging Face.
You can use SmolVLM2 for a range of multimodal tasks, including VQA, document OCR, document VQA, and object counting.
You can deploy SmolVLM2 with Inference.
Installation¶
To install inference with the extra dependencies necessary to run SmolVLM2, run
pip install inference[transformers]
or
pip install inference-gpu[transformers]
How to Use SmolVLM2¶
Create a new Python file called app.py
and add the following code:
from PIL import Image
from inference.models.smolvlm.smolvlm import SmolVLM
pg = SmolVLM(api_key="API_KEY")
image = Image.open("dog.jpeg")
prompt = "How many dogs are in this image?"
result = pg.predict(image,prompt)
print(result)
In this code, we load SmolVLM2 run SmolVLM2 on an image, and annotate the image with the predictions from the model.
Above, replace:
prompt
with the prompt for the model.image.jpeg
with the path to the image that you want to run inference on.
To use SmolVLM2 with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, sign up for a free Roboflow account.
Then, run the Python script you have created:
python app.py
The result from your model will be printed to the console.