Simplify n8n LLM Routing Workflows With A Provider Abstraction Layer

1 n8n LLM routing workflow

1.1 Installing and Configuring the Template

1.2 Calling It from Another Workflow

1.3 What Comes Back

2 A Few Internals Worth Knowing

3 How This Compares to OpenRouter

4 Try It Yourself

Reading Time: 5 minutes

Some links in this article are affiliate links to products I actually use. If you sign up through them, I may earn a small commission — at no extra cost to you.

Copy-pasted API plumbing is a common form of technical debt in AI automation stacks. The first time you wire up an Anthropic or OpenAI node directly in an n8n LLM routing workflow, it feels like the fastest way to a working prototype – and it is. But by the time you have several automations running, they probably contain duplicated wiring, response and error handling.

This becomes painful if you ever want to A/B test providers, or switch to another LLM provider, or even update to a newer model version from the same provider, since this means opening every affected workflow and rewriting every node that touches an LLM.

One solution is to use a gateway service like OpenRouter (more on this later), another is to introduce an abstraction layer in your architecture to give a consistent interface regardless of the LLM being called – simply put, a sub-workflow that accepts standardised parameters, isolates provider-specific translation and returns a predictable response.

n8n LLM routing workflow

This n8n LLM routing workflow decouples prompt logic from API execution. Your main workflows pass a prompt and a few optional parameters; this single reusable sub-workflow handles the rest, dispatching to any of four providers – Anthropic Claude, Google Gemini, Mistral, or OpenAI – and returning a normalised response.

For the technical minded, it’s an implementation of the Adapter / Normaliser pattern, composed with a Facade.

Adapter: each provider has a different API shape (different request bodies, response structures, token field names, quirks (max_completion_tokens vs max_tokens), and the workflow adapts each one to a single common interface.

Facade: hides the complexity of credentials, provider routing, token budget validation, cost tracking, and response processing behind a standard interface.

Installing and Configuring the Template

Import the workflow from the n8n template library.

If you want some background on how these templates are packaged, I have written about my process for exporting n8n workflows for public distribution.

Open the node at the start of the workflow labelled CONFIG to adjust the default settings. The parameters are:

DEFAULT_PROVIDER — set to anthropic, google, mistral, or openai.
DEFAULT_*_MODEL — a separate default model per provider (e.g. DEFAULT_ANTHROPIC_MODEL, DEFAULT_OPENAI_MODEL), so each provider falls back to its own sensible default rather than a single shared value.
DEFAULT_TEMPERATURE — the workflow ships with 0.5 as the default; adjust to suit your use case.
DEFAULT_MAX_TOKENS — the default response length limit.
DEFAULT_RESPONSE_FORMAT — text or json.

Then link your API credentials in the corresponding provider nodes. The workflow supports Anthropic, Google Gemini, Mistral, and OpenAI. You can leave providers that you do not plan to use unset — the workflow only evaluates the active provider branch.

After importing and configuring, you must publish it, as n8n will not allow other workflows to call an inactive sub-workflow.

Calling It from Another Workflow

Add an Execute Workflow node to your main (parent) workflow, set the Source to Database, and select the imported routing workflow. Your parameters will be passed in as a JSON payload – the minimum required to get a response is a single key containing your prompt and the sub-workflow will apply your CONFIG defaults for provider, model, and all other parameters automatically.

{
  "userPrompt": "Write a 100-word summary of the Dutch housing market" 
}

To override the defaults for a specific task – a different model, a lower temperature, structured output – pass the relevant fields along with the prompt:

{
  "userPrompt":      "...",           // required
  "systemPrompt":    "...",           // optional
  "llm_provider":    "google",        // anthropic | google | mistral | openai
  "google_model":    "gemini-3.5-flash",
  "anthropic_model": "claude-haiku-4-5-20251001",
  "mistral_model":   "mistral-medium-latest",
  "openai_model":    "gpt-5.4-mini",
  "temperature":     0.5,              // 0.0 - 1.0 (omitted for o-series reasoning models)
  "max_tokens":      5000,             // integer or numeric string - both accepted
  "response_format": "json"            // json | text
}

The parent workflow sends the payload via the Execute Workflow node and waits for standardised output.

What Comes Back

The sub-workflow returns the original input payload merged with the model’s response in the fields below. Although different providers return responses in specific structures, these are normalised into this consistent format, which lets your workflows handle responses identically regardless of the underlying provider.

Field	Type	Description
`llm_response`	String	The raw text or JSON string returned by the model.
`model_used`	String	The exact model identifier that processed the request.
`provider_used`	String	The provider that handled the execution.
`llm_response_length`	Number	Length of the llm response.
`llm_response_empty`	Boolean	True if response is empty or an API error occurred.
`_input_tokens`	Number	Actual prompt tokens if available, otherwise estimated.
`_output_tokens`	Number	Actual completion tokens if available, otherwise estimated.
`_estimated_cost_usd`	Number / null	Cost estimate from the internal lookup table; `null` for unrecognised models.

The cost estimation uses recent published rates and token counts which are hardcoded in a static lookup table inside the workflow. The rates will drift as providers adjust their pricing, update it when needed. If you use a model not defined in the table, _estimated_cost_usd will return null instead of displaying a guessed figure. Treat this estimate as a trend indicator for monitoring, not as billing data. For anything financial, check the provider’s official pricing page.

A Few Internals Worth Knowing

Before dispatching to any provider, the sub-workflow runs a token budget check. It estimates input token count using a rough provider-specific estimate ( word-count × 1.3 for OpenAI and Anthropic, char ÷ 3.5 for Google, char ÷ 4 for Mistral), and if that exceeds 70% of the target model’s context window, it throws an early error. This is a practical guardrail to avoid unnecessary API spend, not a precise calculation.

The workflow also handles several provider-specific quirks:

OpenAI o-series reasoning models reject the temperature parameter entirely. If you select one of these models, the workflow strips temperature from the request automatically.
OpenAI gpt-5.x models require the token limit to be passed as max_completion_tokens rather than the legacy max_tokens. The workflow maps this based on the model identifier.
Google Gemini JSON mode is enforced via responseMimeType: application/json when you set response_format to json. This is considerably more reliable than asking the model to format its output in the prompt text.

How This Compares to OpenRouter

The obvious question is: why not just use OpenRouter? It provides a unified API that proxies to hundreds of LLM models across multiple providers, handles routing, failovers, and consolidates billing. For many projects it is the most sensible path, especially if you want the simplest possible integration or access to a wide variety of models without managing multiple accounts.

Where this n8n LLM routing workflow makes sense is when you are already running n8n – there’s something satisfying about having your routing logic visible and modifiable in your workflow canvas – and care about a specific set of trade-offs.

For example, if you have direct contracts with providers or prefer to use your own OpenAI, Anthropic, or Google credentials rather than routing through a third party proxy, this keeps the data flow self-contained. It also allows custom provider logic: different retry patterns, provider-specific headers, or preprocessing that would be awkward to bolt onto an external proxy. Having the routing logic in your own workflow means you can just open it and change it.

Try It Yourself

API integrations are never truly “set and forget.” The APIs change, models get deprecated, pricing shifts, and you still have to handle the maintenance. By routing all your prompts through a single sub-workflow, you do not eliminate that work, but you do confine it to one place.

This means that when a provider updates their API, you fix it once, in one place, and every workflow that depends on it keeps running without interruption. Similarly when you want to try a different provider, all that’s needed is a parameter and/or config change.

Get it from the n8n template library

Simplify n8n LLM Routing Workflows With A Provider Abstraction Layer