Rate limits
Per-IP and per-key windows, 429 handling, anomaly alerts.
What this is
COHESION applies two rate-limit layers and a monthly per-org quota. Understanding them prevents surprise 429s.
Two layers
Layer 1: per-IP, pre-auth
- Cloudflare Workers Rate Limiting API.
- 60 requests per 60 seconds per IP.
- Fails closed, before any key lookup.
- Emits
AUTH_FAIL_RATE_LIMITED_IPtoaudit_log.
Layer 2: per-key, post-auth
- D1 sliding window.
- 1000 requests per 60 seconds per key prefix.
- Keyed on the 8-char prefix, never the plaintext.
- Emits
AUTH_FAIL_RATE_LIMITED_KEYtoaudit_log.
429 response
HTTP/1.1 429 Too Many Requests
Retry-After: 27
Retry-After is always an integer >= 1, per RFC 7231. Back off at least that long.
Per-org monthly quota
| Tier | Monthly requests |
|---|---|
| Starter | 10,000 |
| Standard | 100,000 |
| Enterprise | 1,000,000 |
A quota breach does not throttle (the two-layer rate limit does that). It triggers a HIGH anomaly alert when the 24-hour rate exceeds 10x the 30-day rolling p95.
SDK behavior
Both SDKs auto-retry with exponential backoff, capped at maxRetries (default 3). On exhaustion they throw CohesionRateLimitError (TS) / CohesionRateLimitError (Py) including retryAfterSeconds.
Best practices
- Batch where you can (POST /v1/score/batch).
- Stagger backfill jobs across minute boundaries.
- Do not retry 429 tighter than the
Retry-Aftervalue.
Next step
- Error catalog
- Performance for latency expectations.