Rate limits

Limits apply at several layers: request rate per key, concurrent generations per account, a system-wide capacity valve, and optional per-key spend caps. Every 429 response carries a Retry-After header — honoring it is the only back-off logic you need.

Limits at a glance

Layer	Default limit	Scope	On exceed
Request rate	120 requests / minute (fixed 60-second window)	Per API key; per-key overrides available for volume accounts	429 ERR_RATE_LIMIT
Concurrent generations	100 in flight (queued or processing)	Per account; higher limits per contract	429 ERR_QUOTA_EXCEEDED (scope: account)
Capacity valve	System-wide daily capacity	All accounts, image models	429 ERR_QUOTA_EXCEEDED (reason: pool_capacity_exceeded)
Poll interval	1 poll / 5 seconds	Per API key per creation	429 ERR_RATE_LIMIT
Monthly spend cap	Optional USD amount, set per key	Per API key	402 ERR_INSUFFICIENT_BALANCE (reason: monthly_budget_exceeded)

Per-key request rate

Each API key may make 120 requests per minute by default, measured over a fixed 60-second window. Exceeding it returns 429 ERR_RATE_LIMIT with details carrying the limit, the window, and seconds until it resets. Higher per-key limits are available for volume accounts.

Concurrent generations

An account may have up to 100 generations in flight (queued or processing) at once by default. Exceeding it returns 429 ERR_QUOTA_EXCEEDED with details: {"limit": N, "scope": "account"}. Higher concurrency is available per contract.

Capacity valve

A system-wide daily capacity valve protects the image models under peak load. When it trips, requests return 429 ERR_QUOTA_EXCEEDED with details.reason = "pool_capacity_exceeded" and a short retry-after — these are transient.

Monthly spend cap

Keys can carry an optional monthly spend cap — a USD amount (e.g. a $50.00 monthly budget). A request that would exceed it returns 402 ERR_INSUFFICIENT_BALANCE with details.reason = "monthly_budget_exceeded" and the cap value in USD.

Polling interval

Status polling has its own limit: at most one poll every 5 seconds per creation per key — see Job status.

The 429 responses

All 429s use the standard error envelope and carry a Retry-After header (seconds). The details object identifies which limit tripped:

429 responses

# Per-key request rate
HTTP/1.1 429 Too Many Requests
Retry-After: 17

{"error": {"code": "ERR_RATE_LIMIT", "message": "…",
  "details": {"limit": 120, "window": "1m", "retry_after": 17}}}

# Account concurrency
HTTP/1.1 429 Too Many Requests
Retry-After: 2

{"error": {"code": "ERR_QUOTA_EXCEEDED", "message": "…",
  "details": {"limit": 100, "scope": "account"}}}

# Capacity valve
HTTP/1.1 429 Too Many Requests
Retry-After: 5

{"error": {"code": "ERR_QUOTA_EXCEEDED", "message": "…",
  "details": {"reason": "pool_capacity_exceeded", "pool": "…", "retry_after": 5}}}

Handling 429s

Read the Retry-After header, wait at least that many seconds (plus a little jitter so parallel workers don't retry in lockstep), and retry:

import random
import time

import requests

BASE = "https://api.rendergrid.io/api/public/v1"
HEADERS = {"Authorization": "Bearer rg_live_xxx"}


def post_with_backoff(path: str, payload: dict, max_attempts: int = 5):
    for attempt in range(max_attempts):
        resp = requests.post(f"{BASE}{path}", headers=HEADERS, json=payload)
        if resp.status_code != 429:
            return resp
        retry_after = int(resp.headers.get("Retry-After", "2"))
        time.sleep(retry_after + random.uniform(0, 1))  # jitter
    return resp

Combine retries with an idempotency key so retried generation requests are never double-charged.