Multimodal Input
Some models accept images, audio, and video alongside text. ModelMax follows the OpenAI multipart content format — pass an array of content parts instead of a plain string.
Check the Supported models page to see which input modalities each model supports.
Image input
Send images as base64 data URLs or hosted URLs in image_url content parts.
curl -X POST https://api.modelmax.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MODELMAX_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What do you see in this image?" },
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
}
}
]
}
]
}'
response = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"},
},
],
}
],
)
const response = await client.chat.completions.create({
model: "gemini-3-flash-preview",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What do you see in this image?" },
{
type: "image_url",
image_url: { url: "https://example.com/photo.jpg" },
},
],
},
],
});
Base64 images
For local images, encode them as base64 data URLs:
import base64
with open("photo.jpg", "rb") as f:
b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{b64}"},
},
],
}
],
)
Multiple images
You can include up to 3 images in a single message:
response = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Compare these two photos."},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
],
}
],
)
Audio input
Send audio as base64-encoded content parts with input_audio type.
curl -X POST https://api.modelmax.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MODELMAX_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Transcribe this audio." },
{
"type": "input_audio",
"input_audio": {
"data": "<base64-encoded-audio>",
"format": "webm"
}
}
]
}
]
}'
Supported audio formats depend on the provider. Common formats: webm, mp3, wav, ogg.
Models with vision
Not all models support image input. Models with vision capabilities:
| Model | Images | Audio | Video |
|---|---|---|---|
gemini-3.1-pro-preview | Yes | Yes | Yes |
gemini-3-pro-preview | Yes | Yes | Yes |
gemini-3-flash-preview | Yes | Yes | Yes |
gemini-3.1-flash-lite-preview | Yes | Yes | Yes |
qwen3-vl-235b-a22b | Yes | No | No |
Text-only models (e.g. deepseek-v3.2) will return an error if you send image or audio content.
