Prompt Caching
Reduce costs on repeated prompt prefixes with automatic prompt caching. When you send the same prompt prefix multiple times, the cached portion is served at 90% off.
How It Works
Prompt Caching automatically detects repeated prompt prefixes (e.g., system instructions, few-shot examples, large context). When a cache hit occurs, the cached prefix is served at 0.10× the base price. Cache entries have a 5-minute TTL and are refreshed automatically on each matching request.
Pricing
| Cache Tier | TTL | Write | Cache Hit |
|---|---|---|---|
| Standard Cache (5-min TTL) | 5 min | Write: 1.25× base price | Cache hit: 0.10× base price |
Extended Cache (60-min TTL) is on our roadmap. It will offer a 1.50× write multiplier with the same 0.10× cache hit rate. Contact us if this is critical for your use case.
Code Example
# Request with cache control
curl https://api.pandaworld.space/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-api-key" \
-d {
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Hello!"}
]
}
# The system prompt prefix is automatically cached
# Cache hits shown in response headers: x-cache-hit: trueRoadmap
Extended Prompt Caching with a 60-minute TTL is in development. This will allow longer cache persistence for agent loops and recurring automated tasks, at a 1.50× write multiplier. Subscribe to our status page for updates.