9Router — One Gateway to 60+ AI Providers with Smart Fallback
How I set up 9Router as a local AI gateway with 3-tier fallback, RTK token compression, and seamless integration with Claude Code, Codex, and Hermes Agent.
9Router — One Gateway to 60+ AI Providers with Smart Fallback
Managing multiple AI subscriptions udah jadi masalah sendiri. Tiap bulan gw ada Claude, OpenAI, Gemini, plus some free-tier accounts. Tiap CLI tool (Claude Code, Codex, Cursor) butuh config berbeda. Too many API keys, too many endpoints.
Enter 9Router — one OpenAI-compatible endpoint di localhost:20128 yang nge-route semua requests ke 60+ AI providers dengan 3-tier smart fallback.
What is 9Router?
9Router itu AI gateway yang jalan locally di mesin lu. Dia bikin satu endpoint (OpenAI-compatible) yang bisa dipake oleh semua tools:
┌─────────────────┐
│ Claude Code │
│ Codex CLI │
│ Hermes Agent │
│ Cursor / Cline │
└────────┬────────┘
│
▼
┌─────────────────┐
│ 9Router │ ← localhost:20128
│ (Smart Router)│
└────────┬────────┘
│
┌────┼────┬────┬────┐
▼ ▼ ▼ ▼ ▼
┌────┐┌────┐┌────┐┌────┐┌────┐
│Tier1││Tier2││Tier3││Free││OAuth│
└────┘└────┘└────┘└────┘└────┘
3-Tier Smart Fallback
Ini yang bikin 9Router powerful — automatic fallback pas quota habis:
Tier 1 — Subscription (Priority)
- Claude Code (Anthropic)
- OpenAI Codex
- GitHub Copilot
- Gemini (Google AI Studio)
Tier 2 — Cheap & Fast
- GLM Coding ($0.60/1M tokens)
- MiniMax ($0.20/1M)
- Kimi ($9/month unlimited)
Tier 3 — FREE (Unlimited)
- iFlow
- Qwen (Alibaba)
- OpenCode
- Kiro AI
- OpenRouter free tier
Gimana cara kerjanya?
Lu coding pake Claude Code. Pas quota Claude abis, 9Router otomatis switch ke Tier 2 (GLM/Kimi), terus kalau perlu ke Tier 3 (FREE). Lu nggak pernah berhenti coding.
Built-in Token Savers
Dua fitur yang bikin 9Router hemat token (dan uang):
1. RTK (Rust Token Killer) — INPUT Compression
RTK auto-compress tool results (git diff, grep, ls, find) sebelum dikirim ke LLM.
- Lossless compression — sama informasi, cuma lebih ringkas
- Default ON — Jalan otomatis
- Savings: 20% — 65% tergantung command
# Tanpa RTK: 47K tokens
git diff --staged
# Dengan RTK: 28K tokens (same info)
2. Caveman Mode — OUTPUT Compression
Inject terse-style system prompt biar LLM jawab lebih ringkas.
- 5 intensity levels
- "verbose paragraph" → "telegraphic why use many token when few do trick"
My Setup on VPS
Gw install 9Router di VPS (Ubuntu 22.04) biar jalan 24/7 dan bisa diakses dari mana aja.
Installation
bashLoading...Loading syntax highlighting...
Running as System Service
Gw pake PM2 buat jagain 9Router supaya auto-start:
bashLoading...Loading syntax highlighting...
Cloudflare Tunnel (Access from Anywhere)
Karena 9Router jalan di localhost:20128, gw pake Cloudflare Tunnel biar bisa diakses dari luar:
bashLoading...Loading syntax highlighting...
Sekarang gw bisa pake 9Router dari laptop manapun, termasuk Hermes Agent di macOS.
Connecting Tools
Setiap tool tinggal point ke localhost:20128 (atau tunnel URL):
Claude Code:
bashLoading...Loading syntax highlighting...
OpenAI Codex:
bashLoading...Loading syntax highlighting...
Hermes Agent (config.yaml):
yamlLoading...Loading syntax highlighting...
60+ Providers Supported
9Router supports 10 service kinds — bukan cuma chat/LLM:
| Service Kind | Examples |
|---|---|
| Chat/LLM | Claude, GPT-4, Gemini, Qwen, GLM, Kimi |
| Embeddings | Voyage, Jina, OpenAI, Cohere |
| Text-to-Speech | ElevenLabs, Deepgram, Edge TTS |
| Speech-to-Text | Deepgram, AssemblyAI, Whisper |
| Image Gen | Fal, Stability, BFL Flux, Recraft |
| Vision/Image-to-Text | OpenAI, Gemini, Anthropic, Groq |
| Video Gen | Runway ML, Topaz |
| Web Search | Tavily, Brave, Serper, Perplexity |
| Web Fetch | Tavily, Exa, Firecrawl, Jina Reader |
| OAuth Subscription | Claude Code, Copilot, Cursor, Kiro |
Real-time Dashboard
9Router punya visual dashboard di http://localhost:20128 (atau tunnel URL):
- Live token tracking per provider
- Quota reset countdown
- Cost estimation
- Provider health status
- 3-tier fallback logs
Gw bisa liat exact usage dan sisa quota sebelum ke-limit.
Why I Love It
- Zero downtime coding — Pas quota habis, auto-switch ke provider lain
- Maximize subscriptions — Claude Code, Copilot, Cursor tetep kepake maksimal
- FREE tier backup — iFlow, Qwen, OpenCode siap cadangan
- Token savings — RTK + Caveman hemat 20-65% tokens
- One endpoint — Nggak perlu gonta-ganti API keys di setiap tool
- Cloud sync — Tunnel + Cloudflare, akses dari mana aja
What's Next
- Setup Smart Combos — chain multiple providers jadi one virtual provider
- Configure multi-account per provider (round-robin load balancing)
- Integrate proxy pools buat region-restricted providers
- MITM Bridge — intercept Antigravity/Copilot/Kiro IDE traffic
Building this setup ngajarin gw gimana manage multiple AI subscriptions tanpa pusing. Sekarang gw cuma perlu inget satu endpoint: localhost:20128. Sisanya diserahin ke 9Router. 😊
Stack: 9Router • PM2 • Cloudflare Tunnel • Claude Code • OpenAI Codex • Hermes Agent • RTK • Caveman Mode
Cost: $0/month (pake free tiers + subscriptions lu) — FREE forever buat 8 providers, atau pake subscription lu untuk 6 providers OAuth.
Related Posts
Setting Up Hermes Agent — My AI-Powered Dev Assistant
How I set up Hermes Agent with 40+ custom skills, RTK token optimization, and automated workflows for finance, development, and productivity.
Read moreStop Paying for Notes — Obsidian + Syncthing Is All You Need
A free, private, cross-platform note-taking system using Obsidian and Syncthing — with setup guides for macOS, Windows, and Android. No subscriptions, no cloud, no compromises.
Read moreFrontend Development in 2026 — What's Changed
A look at how frontend development has evolved — from React Server Components to AI-assisted coding.
Read more