AI model practice — which models we use for what
An honest list of the models we use daily at KaaTai-Beratung. With a concrete distribution, matrix and the routing pattern that pragmatically combines cloud and local inference.
No single model for everything
There is no 'best model'. There are models that are more pragmatic for certain tasks than others. We use four cloud vendors daily (Claude, Perplexity, Gemini, OpenAI) plus local open-weight models via for client- or patient-related content. The distribution below is our reality — as of 2026-06-02. Shifts as soon as new models serve the respective better.
Our cloud distribution
These four vendors cover the vast majority of our cloud work. Percentages are an honest approximation — as of 2026-06-02, can rearrange with every release.
Claude
Anthropic
Standard workhorse
Claude (esp. Opus 4.7 with 1M context) is our main model for text, code, long documents, strategy outlines, briefs. Very good German quality, consistent answers, very strong . EU endpoint at Anthropic available.
- Modelle
- Opus 4.7 (1M ctx) · Sonnet 4.6 · Haiku
- Lizenz
- SaaS, EU-Endpunkt verfügbar
Perplexity
Perplexity AI
Web research with sources
Perplexity is RAG-first — web search combined with answer and source citations. Mandatory for research tasks: current subsidy logic, competitor analysis, industry updates. Comet browser agent for autonomous research.
- Modelle
- Pro Search · Deep Research · Comet
- Lizenz
- SaaS, RAG-First-Plattform
Gemini
Google DeepMind
Very long contexts + multimodal
Gemini (Pro/Ultra) brings 1M+ token context and very strong multimodal (images, video, audio). We use it for tasks where very large document collections have to be processed at once or image understanding is in the foreground.
- Modelle
- Gemini 3 Pro · Gemini 3 Ultra · Flash
- Lizenz
- SaaS, Google Cloud + Vertex AI
OpenAI
OpenAI
Special tasks + open-weight
OpenAI (GPT-5/o3) for tasks where function-calling depth is strong (e.g. code interpreter, DALL-E images). Plus: GPT-oss as an open-weight variant for local inference. Share is small because other vendors usually have the edge for our .
- Modelle
- GPT-5 · GPT-5.5 · o3-pro · GPT-oss
- Lizenz
- SaaS + Open-Weight (gpt-oss)
Use-case matrix — which model for what
Eight typical from our consulting practice. Per the concrete model recommendation — cloud and local separated, because sensitive content must not go into a cloud.
Text, translation, briefs
Code, refactoring, code review
Web research with sources
RAG (own documents)
Reasoning, logic, mathematics
Multimodal (images, whiteboards, PDFs)
Long documents (>100 pages)
Agent workflows (autonomous)
The routing pattern
We do not combine cloud and local inference at random — but along a clear rule that every staff member can apply without thinking.
Rule 1 — client/patient data = always local
Rule 2 — own code base = local, open-source code = cloud OK
Rule 3 — general research + text = cloud OK
Rule 4 — multi-endpoint frontend for staff
Open WebUI multi-endpoint configuration
# Open WebUI Connections (Settings → Admin → Connections)
# Local Ollama instance (default)
OLLAMA_BASE_URL=http://ollama:11434
# Anthropic (Claude — standard 80%)
ANTHROPIC_API_BASE=https://api.anthropic.com
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODELS=claude-opus-4-7,claude-sonnet-4-6,claude-haiku-4-5
# Perplexity (research — 10%)
PERPLEXITY_API_BASE=https://api.perplexity.ai
PERPLEXITY_API_KEY=pplx-...
PERPLEXITY_MODELS=sonar-pro,sonar-reasoning-pro
# Google Gemini (multimodal + long contexts — 5%)
GOOGLE_API_BASE=https://generativelanguage.googleapis.com
GOOGLE_API_KEY=AIza...
GOOGLE_MODELS=gemini-3-pro,gemini-3-ultra,gemini-flash
# OpenAI (special tasks + GPT-oss — 5%)
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_API_KEY=sk-...
OPENAI_MODELS=gpt-5,gpt-5.5,o3-pro
# Workspaces with a pre-selected model per use case:
# - Workspace 'Clients' → default: llama3.3:70b (local)
# - Workspace 'Marketing' → default: claude-opus-4-7 (cloud)
# - Workspace 'Research' → default: sonar-pro (Perplexity)
# - Workspace 'Vision' → default: gemini-3-pro (cloud)Honest notes
This distribution is ours — as of 2026-06-02. It is not a promise of salvation but the result of our own practice. Other consultancies route differently: many OpenAI-first (because of the ChatGPT brand), some Gemini-first (because of Google Workspace). We landed at Claude because Anthropic has consistently delivered the qualitatively best models for our over the past 18 months — that may change tomorrow.
Also honest: local models are not quite at frontier cloud quality in absolute top-tier yet. Llama 3.3 70B is very good, but Claude Opus 4.7 is measurably better on complex tasks. The trade-off is data sovereignty — and for sensitive content that is non-negotiable.
Related topics
Models need infrastructure and a routing frontend
The model zoo shows the model universe, is the local inference server, is the multi-endpoint frontend, the GDPR server blueprint is the platform underneath:
Ready for the next step?
Free intro call, no strings attached. In 30 minutes you'll know whether and how AI can help your business.