0009 — Multi-LLM gateway with provider abstraction¶

Status: accepted
Date: 2026-04-26
Deciders: @kackey621, Willen Federation contributors

Context and Problem Statement¶

The M6 scope expansion requires AI-driven features — image-based item registration (vision models infer title / category / specs), keyword and similarity search (embeddings + chat-style ranking), and ISBN/barcode enrichment (LLM resolves ambiguous codes against catalogue text). Operators expect to use OpenAI, Google Gemini, and Anthropic Claude interchangeably — sometimes for compliance reasons (data residency), sometimes for cost, sometimes because a particular model is best at a given task.

We need a gateway that:

lets the application call one interface for chat, vision, and embeddings — call sites shouldn't branch on which vendor is configured;
allows per-feature provider selection (e.g. "use Gemini for vision but Claude for chat");
handles failover when a primary provider is rate-limited or down;
keeps API keys out of git (encrypted in system_setting or .env); and
ships a null provider so contributors can run the suite without any cloud key.

Decision Drivers¶

Self-hosted reality — operators deploy SASO on their own infrastructure and choose their own AI vendor. The application must not assume a single provider.
Compliance — Japanese enterprise operators frequently need data-residency guarantees that force one vendor (Gemini in JP, Azure-hosted OpenAI, …).
Cost — embeddings are cheap on OpenAI's smaller models; vision is cheaper on Gemini Flash; reasoning is currently best on Claude. Per-feature routing matters.
Compliance with ADR 0001 — the gateway lives in Infrastructure/; domain code depends on a domain-level interface (AiAssistant).

Considered Options¶

Option A — Pick one vendor (e.g. OpenAI only)¶

(+) Simplest scope.
(−) Operators on Gemini-only or Azure-only stacks cannot use the AI features.
(−) No failover.

Option B — Direct SDK calls scattered through the codebase¶

(+) Zero abstraction overhead.
(−) Every new feature reinvents prompt formatting, retry policy, error mapping. Switching vendors becomes a multi-PR refactor.

Option C — Provider-abstracted gateway: `AiAssistant` interface + per-vendor adapters + feature-keyed provider selection¶

A single Saso\Domain\Ai\AiAssistant interface exposes chatComplete(), extractStructured(), embed(), and describeImage(). Concrete implementations: OpenAiAssistant, GeminiAssistant, ClaudeAssistant, plus NullAssistant for tests / no-key environments. A Saso\Infrastructure\Ai\AssistantRouter reads system_setting rows like ai.provider.vision = gemini, ai.provider.embedding = openai, ai.provider.chat = claude and dispatches per call.

API keys live in system_setting rows of type secret, encrypted via SecretEncryptor (M3-E). The default value is null; AI features short-circuit to a "AI unavailable" response when no key is configured for the feature's provider.

(+) Operators pick per-feature provider from the admin UI.
(+) Failover is a wrapper decorator — FallbackChainAssistant(primary, secondary).
(+) Tests inject NullAssistant (deterministic responses) without touching external services.
(−) Three SDK adapters to maintain. We mitigate by using the official PHP/HTTP libraries for each (openai-php/client, the Gemini REST API, anthropic-ai/anthropic-php).

Decision Outcome¶

Chosen option: C — AiAssistant interface with provider-abstracted adapters and feature-keyed routing.

Domain shape¶

namespace Saso\Domain\Ai;

interface AiAssistant
{
    public function chatComplete(ChatRequest $req): ChatResponse;
    public function extractStructured(StructuredExtractionRequest $req): StructuredExtractionResponse;
    public function embed(EmbeddingRequest $req): EmbeddingResponse;
    public function describeImage(ImageRequest $req): ImageDescriptionResponse;
}

ChatRequest carries a messages array, temperature, max_tokens, and an optional response_format (text / json / json_schema). EmbeddingRequest takes a list of strings or image bytes plus a task hint (retrieval.query / retrieval.passage / clustering) — providers that support it use it.

Provider selection¶

system_setting rows under the ai.* prefix:

Key	Type	Purpose
`ai.provider.chat`	string	`openai` \| `gemini` \| `claude` \| `null`
`ai.provider.embedding`	string	as above
`ai.provider.vision`	string	as above
`ai.openai.api_key`	secret	OpenAI API key
`ai.openai.model.chat`	string	e.g. `gpt-4.1-mini`
`ai.openai.model.embedding`	string	e.g. `text-embedding-3-large`
`ai.gemini.api_key`	secret	Google AI API key
`ai.gemini.model.*`	string	per-task model id
`ai.claude.api_key`	secret	Anthropic API key
`ai.claude.model.*`	string	per-task model id

The AssistantRouter reads these on first use and caches for the request.

Failover¶

Saso\Infrastructure\Ai\FallbackChainAssistant wraps an ordered list of AiAssistants. If the primary throws a transient AiUpstreamException (5xx, 429, network), the chain tries the next. Configuration is declarative in system_setting: ai.fallback.chat = claude,openai means "primary Claude, fall back to OpenAI".

Error codes¶

New range:

SASO-AI-8001  Provider not configured (no API key)
SASO-AI-8002  Provider rate-limited (429)
SASO-AI-8003  Provider response malformed (could not parse expected JSON)
SASO-AI-8004  Provider context window exceeded
SASO-AI-8005  Content policy violation (provider refused)

Lockout safety¶

SAFE_MODE=true in .env (already used by the Pluggable IdP per ADR 0003) also disables every AI feature and falls back to NullAssistant regardless of system_setting state. A misconfigured AI key cannot break the inventory UI.

Consequences¶

All AI features go through AiAssistant. Call sites stay vendor-agnostic.
Operators flip vendor via the admin UI; no redeploys.
Three SDK dependencies arrive in M6 (openai-php/client, an HTTP client for Gemini, anthropic-ai/anthropic-php).
The contract surface is small enough that we accept the maintenance cost of three adapters in exchange for vendor independence.
NullAssistant keeps the test suite hermetic — no cloud calls in CI.
This ADR does not mandate a vector DB; embeddings produced here flow into whatever vector tier ADR 0010 picks.