Prompt Caching

Reduce costs on repeated prompt prefixes with automatic prompt caching. When you send the same prompt prefix multiple times, the cached portion is served at 90% off.

How It Works

Prompt Caching automatically detects repeated prompt prefixes (e.g., system instructions, few-shot examples, large context). When a cache hit occurs, the cached prefix is served at 0.10× the base price. Cache entries have a 5-minute TTL and are refreshed automatically on each matching request.

Pricing

Cache TierTTLWriteCache Hit
Standard Cache (5-min TTL)5 minWrite: 1.25× base priceCache hit: 0.10× base price

Extended Cache (60-min TTL) is on our roadmap. It will offer a 1.50× write multiplier with the same 0.10× cache hit rate. Contact us if this is critical for your use case.

Code Example

cURL
# Request with cache control
curl https://api.pandaworld.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d {
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant..."},
      {"role": "user", "content": "Hello!"}
    ]
  }

# The system prompt prefix is automatically cached
# Cache hits shown in response headers: x-cache-hit: true

Roadmap

Extended Prompt Caching with a 60-minute TTL is in development. This will allow longer cache persistence for agent loops and recurring automated tasks, at a 1.50× write multiplier. Subscribe to our status page for updates.