Home Knowledge Base Retry Logic with Exponential Backoff

Retry Logic with Exponential Backoff is the resilience pattern that automatically re-attempts failed API requests with progressively increasing wait times — the fundamental strategy for handling transient failures in AI API integrations where rate limits (429), server errors (500-503), and network timeouts are common and expected failure modes requiring graceful recovery rather than immediate hard failure.

What Is Retry Logic with Exponential Backoff?

Why Retry Logic Matters for AI APIs

Exponential Backoff Algorithm

Core algorithm:

wait_time = base_delay × (2 ^ retry_count) + random_jitter

Retry 1: 1 × 2^0 + jitter = 1.0 ± 0.5 seconds
Retry 2: 1 × 2^1 + jitter = 2.0 ± 0.5 seconds
Retry 3: 1 × 2^2 + jitter = 4.0 ± 0.5 seconds
Retry 4: 1 × 2^3 + jitter = 8.0 ± 0.5 seconds
Retry 5: 1 × 2^4 + jitter = 16.0 ± 0.5 seconds (then give up)

Which Errors to Retry

HTTP StatusError TypeRetry?Reason
429Rate limit exceededYesWait and retry
500Internal server errorYes (limited)May be transient
502Bad gatewayYesInfrastructure issue
503Service unavailableYesServer overloaded
504Gateway timeoutYesTimeout — retry may succeed
400Bad requestNoRequest is malformed — retry won't help
401UnauthorizedNoWrong API key — retry won't help
403ForbiddenNoPermission issue — retry won't help
404Not foundNoWrong endpoint — retry won't help

Implementation Examples

Python with tenacity library (Recommended):

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import openai

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=60),
    retry=retry_if_exception_type((openai.RateLimitError, openai.APIStatusError))
)
def call_llm(prompt: str) -> str:
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

Manual Implementation with Jitter:

import time, random

def call_with_backoff(prompt: str, max_retries: int = 5) -> str:
    for attempt in range(max_retries):
        try:
            return llm.generate(prompt)
        except (RateLimitError, ServerError) as e:
            if attempt == max_retries - 1:
                raise  # Last attempt — propagate error
            wait = (2 ** attempt) + random.uniform(0, 1)  # Exponential + jitter
            time.sleep(wait)

Rate Limit Header Handling (Advanced): OpenAI returns headers indicating when the rate limit resets:

except RateLimitError as e:
    reset_time = e.response.headers.get("x-ratelimit-reset-requests")
    if reset_time:
        wait = max(float(reset_time), 1.0)  # Wait until reset, not just backoff
        time.sleep(wait)

Production Considerations

Retry logic with exponential backoff is the foundational resilience pattern that separates brittle AI prototypes from production-grade AI applications — by automatically recovering from the transient failures that are inevitable when calling AI APIs at scale, retry logic with jitter transforms occasional API hiccups from user-visible errors into seamless, transparent recovery that maintains application reliability and user trust.

retry logicexponentialbackoff

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.