Rate limits

Per-IP and per-key windows, 429 handling, anomaly alerts.

What this is

COHESION applies two rate-limit layers and a monthly per-org quota. Understanding them prevents surprise 429s.

Two layers

Layer 1: per-IP, pre-auth

Cloudflare Workers Rate Limiting API.
60 requests per 60 seconds per IP.
Fails closed, before any key lookup.
Emits AUTH_FAIL_RATE_LIMITED_IP to audit_log.

Layer 2: per-key, post-auth

D1 sliding window.
1000 requests per 60 seconds per key prefix.
Keyed on the 8-char prefix, never the plaintext.
Emits AUTH_FAIL_RATE_LIMITED_KEY to audit_log.

429 response

HTTP/1.1 429 Too Many Requests
Retry-After: 27

Retry-After is always an integer >= 1, per RFC 7231. Back off at least that long.

Per-org monthly quota

Tier	Monthly requests
Starter	10,000
Standard	100,000
Enterprise	1,000,000

A quota breach does not throttle (the two-layer rate limit does that). It triggers a HIGH anomaly alert when the 24-hour rate exceeds 10x the 30-day rolling p95.

SDK behavior

Both SDKs auto-retry with exponential backoff, capped at maxRetries (default 3). On exhaustion they throw CohesionRateLimitError (TS) / CohesionRateLimitError (Py) including retryAfterSeconds.

Best practices

Batch where you can (POST /v1/score/batch).
Stagger backfill jobs across minute boundaries.
Do not retry 429 tighter than the Retry-After value.

Next step

Error catalog
Performance for latency expectations.