OmniRoute Auto-Combo Engine
v3.8.1Last updated: 2026-05-13
Was this page helpful?
Loading OmniRoute...
NEW: No combo creation required. Use prefix directly in any client.
| ) |
How to use:
# Any IDE or CLI tool that supports OpenAI format Base URL: http://localhost:20128/v1 API Key: <your-endpoint-key> # In your code/config, set model to: model: "auto" # balanced default model: "auto/coding" # best for coding tasks model: "auto/fast" # fastest available model: "auto/cheap" # cheapest per token
What happens:
- active provider connections from the database
-
- or provider's first model)
- virtual combo in-memory (not stored in DB)
-
Key properties:
Behind the scenes:
Request: { model: "auto/coding" }
β
src/sse/handlers/chat.ts detects prefix
β
createVirtualAutoCombo('coding') β candidatePool from active connections
β
handleComboChat (same engine as persisted combos)
β
Auto-scoring selects best provider/model per request
Implementation files:
| ) | |
| objects | |
| system entry |
9-factor scoring function (defined in β ). All weights sum to 1.0.
diagrams/auto-combo-9factor.mmd
| Factor | Default Weight | Description |
|---|---|---|
|
0.22 | Health score from circuit breaker (CLOSED=1.0, HALF_OPEN=0.5, OPEN=0.0) |
|
0.17 | Remaining quota / rate-limit headroom [0..1] |
|
0.17 | Inverse blended cost (60% input + 40% output token price, normalized) β cheaper = higher score |
|
0.13 | Inverse p95 latency normalized to pool β faster = higher score |
|
0.08 | Task-type fitness (coding, review, planning, analysis, debugging, docs) |
|
0.08 | Match between request specificity (manifest hint) and model tier |
|
0.05 | Variance-based stability (low latency stdDev / error rate) |
|
0.05 | Account-tier priority β Ultra=1.0, Pro=0.67, Standard=0.33, Free=0.0 |
|
0.05 | Affinity between the candidate's tier and the manifest-recommended tier |
Sum: (validated by ).
. Each pack overrides the default weights to bias selection toward a specific goal. Below are the full weight tables per pack (each row sums to 1.0).
| Factor | ship-fast | cost-saver | quality-first | offline-friendly |
|---|---|---|---|---|
| quota | 0.15 | 0.15 | 0.10 | 0.40 |
| health | 0.30 | 0.20 | 0.20 | 0.30 |
| costInv | 0.05 | 0.40 | 0.05 | 0.10 |
| latencyInv | 0.35 | 0.05 | 0.05 | 0.05 |
| taskFit | 0.10 | 0.10 | 0.40 | 0.00 |
| stability | 0.00 | 0.05 | 0.15 | 0.10 |
| tierPriority | 0.05 | 0.05 | 0.05 | 0.05 |
are not set in mode packs β treats them as when absent.
- ship-fast β latencyInv 0.35 + health 0.30 (low-latency, healthy connections)
- cost-saver β costInv 0.40 (cheapest tokens win)
- quality-first β taskFit 0.40 + stability 0.15 (best model for the task, consistent)
- offline-friendly β quota 0.40 + health 0.30 (max headroom regardless of speed/cost)
14 routing strategies (declared in β ). The Auto Combo engine itself is exposed under the strategy; the others are available for persisted combos.
| Strategy | Description |
|---|---|
|
First-target ordered list with explicit priority |
|
Weighted random by per-target weight |
|
Cycle through targets in order |
|
Hand off context across targets (long conversations) |
|
Fill each target's quota before moving to next |
|
Power-of-2-choices random load balancing |
|
Uniform random selection |
|
Pick target with lowest current load |
|
Minimize $ per request given catalog pricing |
β |
Prioritize by quota reset time β short reset windows ranked higher |
|
Random without deduplication of repeats |
|
Use Auto Combo scoring (9-factor) β recommended |
|
Last-Known-Good Path (sticky route to last successful target) |
|
Pick target with best fit for current context size |
builds candidates on-the-fly:
β never persisted to DBadding a new provider with enabled automatically expands the candidate pool β no manual combo editing needed. The virtual combo is rebuilt per request, so newly-added or newly-healthy connections are picked up immediately.
no dedicated endpoint β Auto-Combo is consumed in two ways:
or . The virtual factory builds the combo per request β no persistence, no API calls needed.
- Persisted combo with
: Create a regular combo via and set plus / . The same scoring engine is used; the combo is stored in and reusable by ID.
# Zero-config usage (no combo creation)
curl -X POST http://localhost:20128/v1/chat/completions \
-H "Authorization: Bearer <key>" \
-H "Content-Type: application/json" \
-d '{"model":"auto/coding","messages":[{"role":"user","content":"Hello"}]}'
# Persisted auto combo via the regular combos endpoint
curl -X POST http://localhost:20128/api/combos \
-H "Content-Type: application/json" \
-d '{"id":"my-auto","name":"Auto Coder","strategy":"auto","config":{"auto":{"candidatePool":["anthropic","google","openai"],"weights":{"quota":0.15,"health":0.3,"costInv":0.05,"latencyInv":0.35,"taskFit":0.1,"stability":0,"tierPriority":0.05}}}}'
, , , , , ). Supports wildcard patterns (e.g., β high coding score).
(default) plus the 6 values declared in , there are 7 invokable model IDs:
, , , , , ,
itself enumerates 6 values; the 7th option is "no variant" β bare β handled by as .)
) treats tier
membership as one signal via the weight. Default weights (from ):
| blended price wins (60% input + 40% output ratio) | ||
not force Tier 1 first β if Tier 1 latency is bad or
cost-vs-quality is suboptimal, Tier 2 wins. To force tier ordering, use combo
strategy and arrange providers by tier.
weight:
{
"strategy": "auto",
"config": { "auto": { "weights": { "tierPriority": 0.3, "costInv": 0.05 } } }
}
for tier definitions and provider classification.
| File | Purpose |
|---|---|
|
9-factor scoring function, , pool norm |
|
Model Γ task fitness lookup |
|
Selection logic, bandit, budget cap |
|
Exclusion, probes, incident mode |
|
4 weight profiles (ship-fast, cost-saver, quality-first, offline-friendly) |
|
prefix parser + 6 variants |
|
Builds in-memory from live connections |
|
Test hook for mocking provider registry |
|
(14 strategies) |
|
Integration: auto-prefix short-circuit |