Free · open source · zero telemetry

n8n workflows that actually run. Proven against n8n-mcp — the 21k-star community validator.

A curated n8n knowledge base + known-bug catalog for Claude Code. A Claude Code plugin, benchmarked head-to-head with n8n-mcp on 128 real prompts.

Works in production

80%

vs 59% for n8n-mcp

Passes the n8n-mcp validator

94%

vs 72% for n8n-mcp

Install the plugin Explore the benchmark →

128-prompt battery · newest run per prompt · data generated 2026-06-24

Install

Running in under a minute

Inside Claude Code, paste these three commands. Prerequisite: a recent Claude Code — check with claude --version.

1 Add the marketplace

/plugin marketplace add https://github.com/dbenn8/n8n-knowledge
2 Install the plugin

/plugin install n8n-knowledge@n8n-knowledge
3 Activate it (no restart needed)

/reload-plugins

See the plugin at work

Most of what the plugin does is invisible — it injects n8n docs, node specs, and known-bug warnings straight into the model's turn, where only Claude sees them. To watch the actual context it pulls in, open a second terminal and tail its log:

tail -f ~/.cache/n8n-knowledge/debug.log

Every n8n-related question shows exactly which docs and gotchas got injected — the plugin stops being a black box. The log is owner-only and lives under your cache dir (never world-readable /tmp); set the debugRecall plugin option to full for the complete context.

Source, issues, and the benchmark harness: github.com/dbenn8/n8n-knowledge · v0.3.10

The proof

Measured, not asserted

Same 128 prompts, same model (Claude Sonnet 4.6), same scoring — the only variable is the tool. Each generated workflow is judged by the n8n-mcp validation engine, then a blinded Claude Opus judge scores intent & known-bug avoidance.

n8n-knowledge n8n-mcp

Works in production valid · correct · designs around known bugs

80%

59%

Matches intent (correct) blinded Opus judge

93%

70%

Passes the n8n-mcp validator would import & run

94%

72%

128-prompt battery · real n8n-mcp validator + blinded Opus judge · reproduce it on GitHub.

Explore every model & condition →

Methodology

How we know it's better

"Better" is a funnel — each stage a stricter bar than the last. The headline metric is the last one.

valid%

Valid

Passes the n8n-mcp schema validator. It would import.

→

correct%

Correct

Valid AND does what the prompt asked — scored by a blinded Claude Opus judge.

→

works%

Works

Correct AND designs around the relevant known n8n bug, so it won't silently fail in production.

A real engine as judge

Every workflow is validated by the n8n-mcp validation engine — an independent open-source project, not a heuristic or an LLM's opinion. Valid means it would import.

A blinded second judge

A Claude Opus judge scores intent-fidelity and known-bug avoidance with no knowledge of which tool produced which workflow. Verdicts are cached.

Let the data govern

Several promising changes were reverted because the numbers went the wrong way — a biased timeout, a worse repair mode. The benchmark overrules intuition.

Reproducible

Same prompts, same models, isolated per-run config. The harness and prompt set are public — run it yourself.

Honest scope & limitations — what this does not prove

Validator ≠ live import. "Valid" means it passes the n8n-mcp validator (the engine n8n ships node definitions from), not that it ran on a live n8n instance. A deliberate, disclosed trade-off for reproducibility.
Known-bug provenance. Some bug-prompts derive from the same catalog the plugin recalls from, so that metric flatters the tool that draws from it. Reported as a directional signal.
The judge is an LLM. The Opus judge is blinded and cached, but it's still a model scoring models — a second opinion alongside the deterministic validator, never ground truth alone.
Snapshot, not variance study. The figures take the newest run per prompt across the 128-prompt battery on a fixed date — coverage, not multi-sample variance.

Credit where it's due

Built on n8n-mcp

The ground-truth validator behind every number here is n8n-mcp — the independent, open-source MCP server by Romuald Członkowski (21k+ ⭐ on GitHub). It bundles the same node definitions n8n ships, which is exactly what makes it a credible oracle.

It's also the tool we benchmark against — so the competitor's own engine judges both tools, the fairest scoring we could ask for. Integrating it rather than rebuilding a validator is what let us move fast and keep the bar honest. This tool wouldn't exist in its current form without it.

n8n-mcp on GitHub · 21k+ ⭐ — go give it a star

Free & private

It doesn't watch you back

telemetry events

trackers

accounts required

Free and open source. No telemetry, no tracking, no sign-up. The plugin recalls from the n8n knowledge base to do its job — that's the one call it makes — and it keeps nothing about you.

Don't trust us? Good. Read the source.

Under the hood

Powered by Hindsight

The plugin's n8n knowledge lives in Hindsight — an open-source (MIT) agentic-memory engine by Vectorize. Its recall is what surfaces the right docs and the right known-bug warning at the moment Claude needs them.

TEMPR retrieval

Temporal Entity Memory Priming Retrieval: four strategies — semantic, BM25 keyword, graph, and temporal — run in parallel, fused with Reciprocal Rank Fusion, then reranked by a neural cross-encoder.

A real knowledge graph

Hindsight extracts structured facts, resolves entities, and tracks temporal & causal links — so recall returns relevant memory, not just nearest-vector soup.

Retain · Recall · Reflect

Three operations: store durable signals, retrieve the right context on demand, and reflect to synthesize across memories.

Hindsight's approach is described in the technical report "Hindsight is 20/20" (arXiv:2512.12818, Dec 2025), co-authored by Vectorize, Virginia Tech, and The Washington Post, and covered independently by VentureBeat.

91.4%

LongMemEval
(Gemini-3)

83.6%

on a 20B open model
vs 60.2% full-context GPT-4o

Live · the knowledge base behind the plugin

Memories

—

Documents

—

Graph links

—

Memory composition

loading…

Link types

loading…

Build n8n workflows that run.

Free, open source, and benchmarked in the open. Drop it into Claude Code and see the difference on your next workflow.

Install the plugin View on GitHub