Book a call
Use case — Functional area

AI in knowledge management

Manuals, SOPs, contracts, decision notes — in one system that understands natural language and knows relations. Three setup tiers from hosted tool to pipeline with , with an honest call on effort, upkeep and data protection.

Company knowledge lies scattered: in emails, Confluence pages, notebooks, Slack threads and above all in heads. When someone leaves the company, the knowledge goes with them. When someone searches for information, they ask three colleagues, search five folders — and maybe find it. Or not.

A modern knowledge system combines three search logics: full text for exact terms, vector for meaning, optionally graph structures for relations between topics, people and projects. The query “how did we solve the SSL problem back then” finds the right note even without the exact keyword — and shows the related configuration, the involved project and the decision right alongside.

Prerequisite in all tiers: someone maintains the sources. A knowledge base without an editor is a note dump with better search — not a working store. So the build is first and foremost cleanup work, the technology comes after.

Three setup tiers

Which tier fits depends on three factors: sensitivity of the content, available upkeep capacity and need for relation structure (not just full text).

Tier 1

Hosted knowledge base with semantic search

Tool mix

  • Hosted tool with RAG function (Notion AI, Glean Trial, Mem, Coda AI, Microsoft Copilot for Business) — knowledge base and search in one
  • Single source of truth: all manuals, SOPs, onboarding texts and decision notes in one place, replacing scattered folders
  • Search frontend with semantic hits instead of pure full-text search — “how did we solve X back then” finds the right note even without the exact keyword
  • Permissions and visibility model on team level — sensitive areas stay visible without everyone reading everything
  • Integration into Slack, Teams or email — knowledge gets retrieved where the work happens, not in a separate wiki tab

Fit

SMBs at the start of structured knowledge work, needing a fast entry without self-hosting. Works particularly well for teams already on a SaaS stack.

Effort & cost

Setup 3–8 days. Running cost approx. €8–25 per user per month. Scales with team size, not content volume.

Trade-off

Content and sit with the SaaS provider — usually with a GDPR DPA, but rarely exclusively in the EU. Acceptable for general operational knowledge, the wrong tier for strategic notes, mandate data or confidential contracts.

Tier 2

Self-hosted RAG with frontier LLM

Tool mix

  • Your own RAG pipeline: document ingest, chunking, embeddings, vector store (pgvector, Qdrant) on your own infrastructure
  • Knowledge base in open format (Markdown, Notion export, Confluence export, Sharepoint sync) — deliberately not tied to a single vendor
  • Frontier LLM (Claude, GPT, Gemini) for answers and summaries — with DPA, but API calls leave the EU
  • Web frontend for search, optionally a Slack/Teams bot or mail request as input
  • Optional: knowledge graph for relations between notes, projects, people — when semantic search isn't enough because structure matters

Fit

SMBs with grown knowledge base, multiple teams and a claim to data sovereignty over the sources — plus a person responsible for upkeep and index hygiene.

Effort & cost

Setup 12–22 days. Running cost approx. €80–250/month (server, , index hosting). Scales with data and request volume.

Trade-off

A knowledge base decays fast if no one maintains it. Old SOPs, contradictory versions, orphan notes — anyone allowing that has a confidently citing wrong things. Upkeep responsibility is a prerequisite, not a bonus.

Tier 3

Full-self-hosted with local model and graph

Tool mix

  • Tier 2 in full scope, AI components local
  • Local language model (Llama 3, Qwen 2.5, Mistral) for answers — no API call leaves the house
  • Local embedding model and vector store; optionally knowledge graph (Neo4j, Memgraph or property graph in PostgreSQL)
  • Hybrid search: vector (for meaning), full text (for exact terms), graph (for relations) — together better than any method alone
  • Audit trail: who queried what when and which sources were cited — for regulated industries and internal security

Fit

Law firms, tax advisors, healthcare providers, public sector, R&D departments — areas where knowledge content shouldn't leave the house or structural relations between content matter.

Effort & cost

Setup 22–45 days, plus server from €200/month or a one-off hardware investment. Answer quality of local models is very good for structured research questions, weaker than frontier on free phrasing — which is rarely a problem for pure knowledge lookups.

Trade-off

Highest control, highest complexity — and highest upkeep effort. and multi-stage search are powerful, but without a clear (“we need structure, not just full text”), tier 2 is more results-oriented.

What your team should understand

A knowledge base is more than a tool — it's a discipline. Six competency areas that have to be anchored in every setup:

Knowledge base as single source of truth

Which content belongs in, which in the archive, which must be regularly updated. Who is responsible and who approves. An SOP no one writes isn't knowledge — it's a gap with a label.

Understand RAG architecture

Ingest, chunking, , retrieval, re-ranking, generation. Why answer quality depends directly on source upkeep — not on the model.

Hybrid retrieval (vector + full text + graph)

When semantic search wins, when full text, when relations matter. How to combine the three without hit lists becoming unmanageable.

Upkeep discipline

Who feeds in new content, who removes stale content, how contradictory versions get resolved. Knowledge base is editorial work, not file storage.

Permissions, visibility and data protection

Which content is visible to whom, what happens with personnel changes, how deletion duties get implemented. Anyone enabling answers on internal notes also gives up a piece of confidentiality — that belongs documented.

Analysis of search queries

Which questions are asked often, which stay unanswered, where topic clusters emerge without matching sources. The search log produces the next iteration of the knowledge base.

What gets automated

Eight steps the pipeline takes over in running operations — at different depths depending on tier:

Document ingest

New notes, Markdown files, Confluence exports or Sharepoint content are automatically detected, chunked and added to the index — no manual upload needed.

Semantic search with source attribution

Every answer visibly references the source it was assembled from — anyone wanting more goes directly into the note.

Relation queries

Links between people, projects, tools and decisions — not answerable via vector alone, very much so via graph or structured index.

Onboarding search

New employees ask typical entry questions and are automatically presented with the right SOPs, examples and contracts — no more “where do I find X again?” in Slack.

Daily review of gaps

Queries without sufficiently good hits get written into a topic backlog — a pointer to the editorial team where the knowledge base should be updated.

Version detection

When two notes treat the same matter contradictorily, that's flagged — before happily cites both.

Conversation summaries

Longer research dialogues get summarised into compact notes that can be filed in the knowledge base — learning from the search history, not from gut feel.

Audit trail of queries

Who searched what when and which answer they got, with which sources — mandatory for regulated industries, a very useful security layer for everyone else.

What stays MANUAL on purpose

Knowledge management is editorial work. These six points don't belong in a :

Knowledge editing

Which content is binding, which draft, which gets deleted — that's editorial work, not a .

Structure, don't just collect

A growing pile of notes isn't yet a knowledge base. Clear hierarchies, tags, references — that doesn't come out of , it gets curated.

Sensitive content

Strategy papers, HR files, mandate master data — the decision what sits where and who may get which answer is owner responsibility.

Resolve contradictory sources

When two SOPs claim different truths, the system can flag — a human has to decide which version applies.

Upkeep on personnel changes

What happens to notes of employees who leave? Which content stays visible, which gets archived? These transitions belong in a clear process.

Review cycles

Even a with a perfect pipeline ages — a quarterly review of what still applies is the difference between a living knowledge base and a well-indexed past.

How the build runs

From the knowledge inventory to full self-operation usually 10–18 weeks, depending on tier, source maturity and cleanup need:

1

Knowledge inventory

Which sources exist, where they sit, who maintains them. What's current, what's orphaned, what's contradictory. Without this inventory you shovel garbage into the index.

2

Use-case cut

What should the system answer — onboarding, internal research, mandate info, customer support? The shapes the data model, not the other way round.

3

Choose the setup tier

Hosted, frontier or full-self-hosted — depending on content sensitivity, volume, industry . Reasoned recommendation, you decide.

4

Curate sources

Clean up binding sources, clarify versioning rules, define tags and hierarchy — the most tedious phase, but it caps answer quality long-term.

5

Build the pipeline

Ingest, chunking, , search, answer frontend, possibly graph connection. Configuration tuned to industry and .

6

Training & hands-on handover

4–6-hour workshop: maintain the knowledge base, spot gaps from the search log, resolve version conflicts, read the audit trail.

7

Guided pilot month

Four weeks with weekly sparring: review real queries, fill source gaps, calibrate hybrid search, observe search behaviour.

8

Self-operation with upkeep rhythm

Clear responsibilities for editing and review cycles — otherwise the system ages unnoticed. Optional: quarterly refresher for larger topic shifts.

Effort and investment depend on the chosen tier and the upkeep state of the sources — a concrete estimate comes after the knowledge inventory and as part of the pricing overview.

Ready for the next step?

Free intro call, no strings attached. In 30 minutes you'll know whether and how AI can help your business.

Book a callBAFA funding