Multimodal Input

Some models accept images, audio, and video alongside text. ModelMax follows the OpenAI multipart content format — pass an array of content parts instead of a plain string.

Check the Supported models page to see which input modalities each model supports.

Image input

Send images as base64 data URLs or hosted URLs in image_url content parts.

curl -X POST https://api.modelmax.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MODELMAX_API_KEY" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What do you see in this image?" },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/photo.jpg"
            }
          }
        ]
      }
    ]
  }'

response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What do you see in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"},
                },
            ],
        }
    ],
)

const response = await client.chat.completions.create({
  model: "gemini-3-flash-preview",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What do you see in this image?" },
        {
          type: "image_url",
          image_url: { url: "https://example.com/photo.jpg" },
        },
      ],
    },
  ],
});

Base64 images

For local images, encode them as base64 data URLs:

import base64

with open("photo.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image."},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{b64}"},
                },
            ],
        }
    ],
)

Multiple images

You can include up to 3 images in a single message:

response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two photos."},
                {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
                {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
            ],
        }
    ],
)

Audio input

Send audio as base64-encoded content parts with input_audio type.

curl -X POST https://api.modelmax.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MODELMAX_API_KEY" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Transcribe this audio." },
          {
            "type": "input_audio",
            "input_audio": {
              "data": "<base64-encoded-audio>",
              "format": "webm"
            }
          }
        ]
      }
    ]
  }'

Supported audio formats depend on the provider. Common formats: webm, mp3, wav, ogg.

Models with vision

Not all models support image input. Models with vision capabilities:

Model	Images	Audio	Video
`gemini-3.1-pro-preview`	Yes	Yes	Yes
`gemini-3-pro-preview`	Yes	Yes	Yes
`gemini-3-flash-preview`	Yes	Yes	Yes
`gemini-3.1-flash-lite-preview`	Yes	Yes	Yes
`qwen3-vl-235b-a22b`	Yes	No	No

Text-only models (e.g. deepseek-v3.2) will return an error if you send image or audio content.