Start chatting with the AI by typing a prompt in the input field below and pressing Enter.
For models that support images, you can upload an image by clicking the paperclip
icon.
You
smolvlm-256m-instruct
smolvlm-256m-instruct
SmolVLM-256M is the smallest multimodal model in the world. It accepts arbitrary sequences of image and text inputs to produce text outputs. It's designed for efficiency. SmolVLM can answer questions about images, describe visual content, or transcribe text. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. It can run inference on one image with under 1GB of GPU RAM.