TechEarl

Cut Your Gemini Nano Banana Bill in Half with the Batch API (2026)

Generating images one at a time is slow and full price. How to halve your bill with the Gemini Batch API and Nano Banana, with the per-image numbers and a Node example from a production cover pipeline.

Ishan Karunaratne⏱️ 10 min readUpdated
Share thisCopied
Generating images one at a time is full price. How to halve your bill with the Gemini Batch API and Nano Banana, with per-image numbers and a Node example.

If you generate images in bulk (product shots, marketing variations, blog covers, thumbnails, dataset images) and you are calling the model one image at a time, you are paying full price for work that has a half-price lane. Google's Batch API runs Nano Banana image generation at 50% off, the same discount it gives text. Nano Banana 2.5 Flash drops from $0.039 to $0.0195 per image; the larger Nano Banana models follow the same halving. For a run of any real size, that is the difference between $195 and $98.

This is the image-generation companion to the LLM batch API guide. It matters here specifically because Claude has no image generation, so batch image work goes to Gemini or OpenAI. I run my own blog's cover images through exactly this pipeline, so the numbers below are what I actually pay.

Per-image prices and model names are current as of June 2026 and Google iterates the Nano Banana line often. Check the Gemini pricing page in Sources before budgeting a large run, and note that the newer models price by output resolution.

Jump to:

When to batch image generation

The same test as text: nobody is waiting for any single image. If a person clicks "generate" and stares at a spinner, that is a synchronous job. But a great deal of image work is not that:

  • Product imagery at scale. A catalog of thousands of items, each needing a generated or edited hero image.
  • Variations. Five backgrounds, three crops, two color treatments across a set of base assets.
  • Editorial and marketing assets. A content wave's worth of covers, social cards, or banners produced ahead of publishing.
  • Thumbnails. Per-video or per-article thumbnails generated from a template.
  • Training and eval datasets. Generating image sets to train or benchmark another model.

All of those produce images that land in a bucket or a CDN, not in front of a waiting user, which is exactly the latency-tolerant shape the batch lane is for.

The per-image numbers

Nano Banana is Google's family of Gemini image-generation models: gemini-2.5-flash-image (Nano Banana), gemini-3.1-flash-image (Nano Banana 2), and gemini-3-pro-image (Nano Banana Pro). It is priced per image, and the Batch API halves that. The newer models price by output resolution, so the range reflects 0.5K up to 4K outputs.

ModelStandardBatch (50% off)
Gemini 2.5 Flash Image (Nano Banana)$0.039 / image$0.0195 / image
Gemini 3.1 Flash Image (Nano Banana 2)$0.045 to $0.151$0.022 to $0.076
Gemini 3 Pro Image (Nano Banana Pro)$0.134 to $0.24$0.067 to $0.12

A concrete run: 5,000 images on 2.5 Flash Image is $195 synchronous, $98 batched. On the Pro model at 4K it is $1,200 synchronous, $600 batched. The discount is flat 50%, so the lever is always worth pulling for non-urgent work; the model and resolution choice is what sets the absolute number, the same way model right-sizing dominates text costs.

Still a true batch, not a loop

The principle carries over unchanged from the text batch guide: a true batch is one async job carrying many image requests, billed at half price, returned within 24 hours (usually much sooner; Gemini expires any job not finished within 48). It is not a loop that calls the image endpoint 5,000 times at full price. You assemble the prompts into one job, submit, and collect the rendered images later. With inline requests the responses come back in request order, so you map them by index; with the JSONL file path each result carries a key you assigned, so you match by key. Either way, do not rely on the file output being ordered.

A Node example

The Gemini Batch API takes either inline requests (under 20 MB) or a JSONL file via the Files API for larger jobs. The shape is create, poll, collect, exactly like the text batch. Here is the inline path with the Google GenAI SDK; the exact image-config fields are in the Gemini docs linked in Sources.

javascript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

// Your prompts, in order. In a real run these come from a queue or a DB.
const prompts = await loadPrompts(); // [{ key, text }, ...]

// Inline requests: an array of GenerateContentRequest objects.
const requests = prompts.map((p) => ({
  contents: [{ parts: [{ text: p.text }] }],
  config: { responseModalities: ["IMAGE"] },
}));

// 1. Submit ONE batch job carrying every prompt (not a loop of calls).
let job = await ai.batches.create({ model: "gemini-2.5-flash-image", src: requests });
console.log(`Submitted ${requests.length} prompts as ${job.name}`);

// 2. Poll until a TERMINAL state. Handle all four, or a cancelled or expired
//    job loops forever.
const DONE = new Set([
  "JOB_STATE_SUCCEEDED",
  "JOB_STATE_FAILED",
  "JOB_STATE_CANCELLED",
  "JOB_STATE_EXPIRED",
]);
while (!DONE.has(job.state)) {
  await new Promise((r) => setTimeout(r, 30_000));
  job = await ai.batches.get({ name: job.name });
  console.log(`  state: ${job.state}`);
}

// 3. Inline responses come back in REQUEST ORDER, so map each one to its
//    prompt by index and save the image.
const responses = job.dest?.inlinedResponses ?? [];
responses.forEach((r, i) => {
  const part = r.response?.candidates?.[0]?.content?.parts?.find((p) => p.inlineData);
  if (part) saveImage(prompts[i].key, Buffer.from(part.inlineData.data, "base64"));
});

For runs over the 20 MB inline limit, write the requests to a JSONL file, one { "key": ..., "request": {...} } object per line, upload it with the Files API, and pass the file to batches.create instead of an inline array. There each result line carries its key, so you match by key instead of by index. The poll-and-collect half is otherwise the same.

How I use this for blog covers

This site's covers are generated with Nano Banana, and I batch them. When a content wave needs five or ten new covers, I do not sit and generate them one at a time at full price. I write a small JSONL queue of { slug, prompt } lines, submit it as one batch, and download the rendered PNGs when the job finishes, each landing in its slug's folder by key. The wrapper script handles the resumable upload, the poll, and the download; the paid step is the only thing batched, and everything after it (composing the card layout, rendering responsive variants) is free local work.

To put a real number on it: the four covers for this very cluster went out as one batch on gemini-3-pro-image and came back in about three minutes, for roughly $0.27 batched against about $0.54 synchronous. Small absolute dollars at four images, but it is a flat 50% every time, and it scales linearly: the same run at 5,000 images is the $98-versus-$195 gap from the pricing table.

The honest trade-off: I give up seeing each cover the instant I ask for it. For a single cover I want to tweak and ship in one sitting, I still use the synchronous interactive path and pay full price, because the half-price saving is not worth a multi-hour wait on one image. The batch lane is for the known set, generated ahead of when I need them.

OpenAI as the alternative

OpenAI's Batch API also discounts image generation by 50%, through the same POST /v1/batches flow used for text, pointed at the image endpoint. Per-image pricing depends on the GPT Image model and the output size, and the lineup shifts (the original gpt-image-1 is on a deprecation path), so price the specific model you choose against OpenAI's current rate card before a large run. The batch mechanics are identical: one async job, results keyed by custom_id, 50% off.

The one provider that does not enter this conversation is Anthropic: Claude is text only and has no image generation, batched or otherwise. For text batch work, that guide is the LLM batch API, and the full set of cost levers is in how to cut your AI API bill.

Caveats

  • Inline responses keep request order; the file path uses keys. With inline requests you can map results to prompts by index. The JSONL file path returns a key per result, so match by key there, and do not assume that path preserves order.
  • It is asynchronous. Plan for up to 24 hours even though most jobs finish much faster, and note Gemini expires any job not finished within 48 hours. This lane is for images you need ahead of time, not on demand.
  • Resolution drives the bill. On the Nano Banana 2 and Pro models, price scales with output size. Generate at the size you will actually use rather than defaulting to 4K.
  • Safety filtering still applies. Some prompts will be refused or return no image; handle the empty-response case per request so one rejection does not stall the collect step.

FAQ

Yes. The Batch API applies the same 50% discount to image generation that it applies to text. Nano Banana 2.5 Flash Image is $0.039 per image synchronously and $0.0195 batched, an exact halving. The larger Nano Banana 2 and Pro models follow the same 50% reduction, priced by output resolution.

No. Claude is text only and has no image generation. Batch image work goes to Gemini (Nano Banana) or OpenAI (GPT Image). Claude's Batch API is excellent for text jobs, covered in the LLM batch API guide, but it cannot produce images.

The same 24-hour target as text batches, and like text it usually finishes well inside that, though Gemini expires any job not finished within 48 hours. The point of batching is that you are not waiting, so use it for images you are producing ahead of when you need them, and keep the synchronous endpoint for the one-off image you want to see and tweak immediately.

Start with 2.5 Flash Image at $0.0195 per image batched; it handles most product, thumbnail, and editorial work. Move up to Nano Banana 2 (3.1 Flash) or Pro (3 Pro Image) only when you need the higher fidelity or larger output sizes, since those price by resolution and cost several times more per image. As with text models, pick the cheapest tier that passes, then batch.

Sources

Authoritative references this article was fact-checked against.

TagsGeminiNano BananaBatch APIImage GenerationCost OptimizationAI

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts