0009 — Multi-LLM gateway with provider abstraction¶
- Status: accepted
- Date: 2026-04-26
- Deciders: @kackey621, Willen Federation contributors
Context and Problem Statement¶
The M6 scope expansion requires AI-driven features — image-based item registration (vision models infer title / category / specs), keyword and similarity search (embeddings + chat-style ranking), and ISBN/barcode enrichment (LLM resolves ambiguous codes against catalogue text). Operators expect to use OpenAI, Google Gemini, and Anthropic Claude interchangeably — sometimes for compliance reasons (data residency), sometimes for cost, sometimes because a particular model is best at a given task.
We need a gateway that:
- lets the application call one interface for chat, vision, and embeddings — call sites shouldn't branch on which vendor is configured;
- allows per-feature provider selection (e.g. "use Gemini for vision but Claude for chat");
- handles failover when a primary provider is rate-limited or down;
- keeps API keys out of git (encrypted in
system_settingor.env); and - ships a null provider so contributors can run the suite without any cloud key.
Decision Drivers¶
- Self-hosted reality — operators deploy SASO on their own infrastructure and choose their own AI vendor. The application must not assume a single provider.
- Compliance — Japanese enterprise operators frequently need data-residency guarantees that force one vendor (Gemini in JP, Azure-hosted OpenAI, …).
- Cost — embeddings are cheap on OpenAI's smaller models; vision is cheaper on Gemini Flash; reasoning is currently best on Claude. Per-feature routing matters.
- Compliance with ADR 0001 — the gateway lives in
Infrastructure/; domain code depends on a domain-level interface (AiAssistant).
Considered Options¶
Option A — Pick one vendor (e.g. OpenAI only)¶
- (+) Simplest scope.
- (−) Operators on Gemini-only or Azure-only stacks cannot use the AI features.
- (−) No failover.
Option B — Direct SDK calls scattered through the codebase¶
- (+) Zero abstraction overhead.
- (−) Every new feature reinvents prompt formatting, retry policy, error mapping. Switching vendors becomes a multi-PR refactor.
Option C — Provider-abstracted gateway: AiAssistant interface + per-vendor adapters + feature-keyed provider selection¶
A single Saso\Domain\Ai\AiAssistant interface exposes chatComplete(), extractStructured(), embed(), and describeImage(). Concrete implementations: OpenAiAssistant, GeminiAssistant, ClaudeAssistant, plus NullAssistant for tests / no-key environments. A Saso\Infrastructure\Ai\AssistantRouter reads system_setting rows like ai.provider.vision = gemini, ai.provider.embedding = openai, ai.provider.chat = claude and dispatches per call.
API keys live in system_setting rows of type secret, encrypted via SecretEncryptor (M3-E). The default value is null; AI features short-circuit to a "AI unavailable" response when no key is configured for the feature's provider.
- (+) Operators pick per-feature provider from the admin UI.
- (+) Failover is a wrapper decorator —
FallbackChainAssistant(primary, secondary). - (+) Tests inject
NullAssistant(deterministic responses) without touching external services. - (−) Three SDK adapters to maintain. We mitigate by using the official PHP/HTTP libraries for each (
openai-php/client, the Gemini REST API,anthropic-ai/anthropic-php).
Decision Outcome¶
Chosen option: C — AiAssistant interface with provider-abstracted adapters and feature-keyed routing.
Domain shape¶
namespace Saso\Domain\Ai;
interface AiAssistant
{
public function chatComplete(ChatRequest $req): ChatResponse;
public function extractStructured(StructuredExtractionRequest $req): StructuredExtractionResponse;
public function embed(EmbeddingRequest $req): EmbeddingResponse;
public function describeImage(ImageRequest $req): ImageDescriptionResponse;
}
ChatRequest carries a messages array, temperature, max_tokens, and an optional response_format (text / json / json_schema). EmbeddingRequest takes a list of strings or image bytes plus a task hint (retrieval.query / retrieval.passage / clustering) — providers that support it use it.
Provider selection¶
system_setting rows under the ai.* prefix:
| Key | Type | Purpose |
|---|---|---|
ai.provider.chat |
string | openai | gemini | claude | null |
ai.provider.embedding |
string | as above |
ai.provider.vision |
string | as above |
ai.openai.api_key |
secret | OpenAI API key |
ai.openai.model.chat |
string | e.g. gpt-4.1-mini |
ai.openai.model.embedding |
string | e.g. text-embedding-3-large |
ai.gemini.api_key |
secret | Google AI API key |
ai.gemini.model.* |
string | per-task model id |
ai.claude.api_key |
secret | Anthropic API key |
ai.claude.model.* |
string | per-task model id |
The AssistantRouter reads these on first use and caches for the request.
Failover¶
Saso\Infrastructure\Ai\FallbackChainAssistant wraps an ordered list of AiAssistants. If the primary throws a transient AiUpstreamException (5xx, 429, network), the chain tries the next. Configuration is declarative in system_setting: ai.fallback.chat = claude,openai means "primary Claude, fall back to OpenAI".
Error codes¶
New range:
SASO-AI-8001 Provider not configured (no API key)
SASO-AI-8002 Provider rate-limited (429)
SASO-AI-8003 Provider response malformed (could not parse expected JSON)
SASO-AI-8004 Provider context window exceeded
SASO-AI-8005 Content policy violation (provider refused)
Lockout safety¶
SAFE_MODE=true in .env (already used by the Pluggable IdP per ADR 0003) also disables every AI feature and falls back to NullAssistant regardless of system_setting state. A misconfigured AI key cannot break the inventory UI.
Consequences¶
- All AI features go through
AiAssistant. Call sites stay vendor-agnostic. - Operators flip vendor via the admin UI; no redeploys.
- Three SDK dependencies arrive in M6 (
openai-php/client, an HTTP client for Gemini,anthropic-ai/anthropic-php). - The contract surface is small enough that we accept the maintenance cost of three adapters in exchange for vendor independence.
NullAssistantkeeps the test suite hermetic — no cloud calls in CI.- This ADR does not mandate a vector DB; embeddings produced here flow into whatever vector tier ADR 0010 picks.