Every shipping feature, mapped to what we use in production.
Free Groq → Cerebras → Gemini Flash → CF AI Workers → paid GPT-4o / Claude / etc. Picks cheapest meeting latency+quality SLA.
Set base_url to our gateway. Existing OpenAI SDK code keeps working.
Configure per-route policies: latency-first / cost-first / quality-first.
Every request signed, queryable, exportable for SOC2 path.
Free 1k req/mo, Indie 100k, Pro 1M, Enterprise SLA.
Enterprise: ship the gateway in your VPC. Same code.