Build AI agents
Configured per role and task — model, , tools, permissions. Optionally as an open-source frontend for entire teams.
agents are more than . They understand context, access your own data, take actions and work around the clock. But not every workplace needs the same model, the same tools or the same permissions.
A sales agent needs to read CRM data and write creative copy. An accounting agent needs to answer deterministically and traceably and access the accounting system — but never customer data. A knowledge agent at management level should analyse long documents and reason at the highest level.
The answer isn't a single universal bot, but a thoughtful setup per role. Model choice, , tools, skills and permissions get combined so every agent is configured exactly right for its task — and none has more data or rights than it needs.
& requirements
Task, user group, data sources, success metric, escalation — as a one-page agent brief.
Agent design
Model choice, , tools, , permissions — set individually per role.
Build the prototype
2–3 weeks with real data and a small pilot user group — no PowerPoint mock-up.
Test & optimise
Happy path, edge cases, hallucinations, , acceptance, cost.
Deployment & operations
Multi-user frontend with , audit logs, monitoring, escalation, training.
& requirements
Task, user group, data sources, success metric, escalation — as a one-page agent brief.
Agent design
Model choice, , tools, , permissions — set individually per role.
Build the prototype
2–3 weeks with real data and a small pilot user group — no PowerPoint mock-up.
Test & optimise
Happy path, edge cases, hallucinations, , acceptance, cost.
Deployment & operations
Multi-user frontend with , audit logs, monitoring, escalation, training.
How a productive AI agent comes together
Build the prototype
2–3 weeksReal dataDesign becomes software. In two to three weeks a working prototype emerges, with real data and a small pilot user group. No PowerPoint mock-up, no “we could build it like this”.
What goes into the prototype:
- Model and API connection (Anthropic API, OpenAI API or a local Ollama instance)
- System prompt with the defined rules and tone
- First tool connections (1–3 tools that are critical for the main flow)
- If needed, a RAG index over the most important documents
- A simple frontend — your own web app, a Slack/Teams integration or Open WebUI
- Logging of every interaction for later analysis
The prototype runs on your infrastructure (or one we provide in the meantime), with real anonymised data — not the sample data from a tutorial. Only then do the actual problems show up.
Test, optimise, edge cases
Hallucination checkAcceptanceAn AI agent that works for the first ten requests isn't yet production-ready. It's about request number 387, where something unusual happens — and whether it then behaves cleanly.
What we systematically test:
- Happy path — typical requests that come up daily. Must be 100% on the mark
- Edge cases — unusual but legitimate requests. We learn these from pilot logs
- Hallucinations — where does the agent invent something instead of saying “I don't know”? Reduce with RAG source verification and lower temperature
- Prompt injection — can someone make the agent ignore instructions or reveal confidential data? Especially important for external user groups
- Acceptance in the team — is the agent used? Do users understand when to escalate? Do they trust the answers?
- Performance & cost — how many tokens per request? Does that stay within budget as usage scales?
The tests lead to iterative adjustments — to the prompt, to tool permissions, to model choice, to the escalation mechanism. Only when all three stakeholder groups (users, business unit, IT) give the green light do we move on.
Deployment & operations
Multi-userAuditPrototype becomes production software. This is where the setup that separates a hobby bot from an enterprise-grade agent kicks in — especially with multiple users or in regulated industries.
Multi-user frontend (when several employees use the agent):
- Open WebUI as the frontend — multi-user login, no per-seat pricing like commercial chat tools
- RBAC — who may use which agents/models follows from the role (sales, accounting, management)
- Shared prompt templates — the sales team can use prepared system prompts without maintaining them themselves
- Audit logs — who sent which prompt when, which tools were called, which data was retrieved? Mandatory for compliance
Deployment options:
- Cloud — when there are no compliance hurdles, the fastest path to production
- Self-hosted on your own server — for regulated industries (law firms, practices, agencies, banks). Fully inside your own infrastructure
- Hybrid — frontier model externally for non-critical tasks, local model internally for sensitive routines. Often the best practical solution
What operations include:
- Monitoring of answer quality — automatic samples with human review
- Cost dashboard — token consumption per role, per use case, per month
- Update path for new model versions — controlled, not automatic
- Escalation workflow — when the agent is unsure, it hands over to a human with the full conversation history
- User training — what the agent does well, what it doesn't, when to push back?
Sounds interesting?
Let's talk it through in a free intro call and see how this would work for you.