Free · open source · zero telemetry
n8n workflows that actually run. Proven against n8n-mcp — the 21k-star community validator.
A curated n8n knowledge base + known-bug catalog for Claude Code. A Claude Code plugin, benchmarked head-to-head with n8n-mcp on 128 real prompts.
Works in production
80%
vs 59% for n8n-mcp
Passes the n8n-mcp validator
94%
vs 72% for n8n-mcp
128-prompt battery · newest run per prompt · data generated 2026-06-24
Install
Running in under a minute
Inside Claude Code, paste these three commands. Prerequisite: a recent Claude Code — check with claude --version.
- 1 Add the marketplace
/plugin marketplace add https://github.com/dbenn8/n8n-knowledge - 2 Install the plugin
/plugin install n8n-knowledge@n8n-knowledge - 3 Activate it (no restart needed)
/reload-plugins
See the plugin at work
Most of what the plugin does is invisible — it injects n8n docs, node specs, and known-bug warnings straight into the model's turn, where only Claude sees them. To watch the actual context it pulls in, open a second terminal and tail its log:
tail -f ~/.cache/n8n-knowledge/debug.log Every n8n-related question shows exactly which docs and gotchas got injected — the plugin stops being a black box. The log is owner-only and lives under your cache dir (never world-readable /tmp); set the debugRecall plugin option to full for the complete context.
Source, issues, and the benchmark harness: github.com/dbenn8/n8n-knowledge · v0.3.10
The proof
Measured, not asserted
Same 128 prompts, same model (Claude Sonnet 4.6), same scoring — the only variable is the tool. Each generated workflow is judged by the n8n-mcp validation engine, then a blinded Claude Opus judge scores intent & known-bug avoidance.
128-prompt battery · real n8n-mcp validator + blinded Opus judge · reproduce it on GitHub.
Methodology
How we know it's better
"Better" is a funnel — each stage a stricter bar than the last. The headline metric is the last one.
Valid
Passes the n8n-mcp schema validator. It would import.
Correct
Valid AND does what the prompt asked — scored by a blinded Claude Opus judge.
Works
Correct AND designs around the relevant known n8n bug, so it won't silently fail in production.
A real engine as judge
Every workflow is validated by the n8n-mcp validation engine — an independent open-source project, not a heuristic or an LLM's opinion. Valid means it would import.
A blinded second judge
A Claude Opus judge scores intent-fidelity and known-bug avoidance with no knowledge of which tool produced which workflow. Verdicts are cached.
Let the data govern
Several promising changes were reverted because the numbers went the wrong way — a biased timeout, a worse repair mode. The benchmark overrules intuition.
Reproducible
Same prompts, same models, isolated per-run config. The harness and prompt set are public — run it yourself.
Honest scope & limitations — what this does not prove
- Validator ≠ live import. "Valid" means it passes the n8n-mcp validator (the engine n8n ships node definitions from), not that it ran on a live n8n instance. A deliberate, disclosed trade-off for reproducibility.
- Known-bug provenance. Some bug-prompts derive from the same catalog the plugin recalls from, so that metric flatters the tool that draws from it. Reported as a directional signal.
- The judge is an LLM. The Opus judge is blinded and cached, but it's still a model scoring models — a second opinion alongside the deterministic validator, never ground truth alone.
- Snapshot, not variance study. The figures take the newest run per prompt across the 128-prompt battery on a fixed date — coverage, not multi-sample variance.
Credit where it's due
Built on n8n-mcp
The ground-truth validator behind every number here is n8n-mcp — the independent, open-source MCP server by Romuald Członkowski (21k+ ⭐ on GitHub). It bundles the same node definitions n8n ships, which is exactly what makes it a credible oracle.
It's also the tool we benchmark against — so the competitor's own engine judges both tools, the fairest scoring we could ask for. Integrating it rather than rebuilding a validator is what let us move fast and keep the bar honest. This tool wouldn't exist in its current form without it.
n8n-mcp on GitHub · 21k+ ⭐ — go give it a starFree & private
It doesn't watch you back
0
telemetry events
0
trackers
0
accounts required
Free and open source. No telemetry, no tracking, no sign-up. The plugin recalls from the n8n knowledge base to do its job — that's the one call it makes — and it keeps nothing about you.
Don't trust us? Good. Read the source.
Under the hood
Powered by Hindsight
The plugin's n8n knowledge lives in Hindsight — an open-source (MIT) agentic-memory engine by Vectorize. Its recall is what surfaces the right docs and the right known-bug warning at the moment Claude needs them.
TEMPR retrieval
Temporal Entity Memory Priming Retrieval: four strategies — semantic, BM25 keyword, graph, and temporal — run in parallel, fused with Reciprocal Rank Fusion, then reranked by a neural cross-encoder.
A real knowledge graph
Hindsight extracts structured facts, resolves entities, and tracks temporal & causal links — so recall returns relevant memory, not just nearest-vector soup.
Retain · Recall · Reflect
Three operations: store durable signals, retrieve the right context on demand, and reflect to synthesize across memories.
Hindsight's approach is described in the technical report "Hindsight is 20/20" (arXiv:2512.12818, Dec 2025), co-authored by Vectorize, Virginia Tech, and The Washington Post, and covered independently by VentureBeat.
91.4%
LongMemEval
(Gemini-3)
83.6%
on a 20B open model
vs 60.2% full-context GPT-4o
Live · the knowledge base behind the plugin
Memories
—Documents
—Graph links
—Memory composition
Link types
Build n8n workflows that run.
Free, open source, and benchmarked in the open. Drop it into Claude Code and see the difference on your next workflow.