Starter — VPS, CPU-only
Tool mix
- Mid-sized VPS at a German provider (e.g. netcup, Hetzner), 16–32 GB RAM, NVMe SSD, no GPU
- Ollama with smaller models (Gemma 4 in 4B–9B, Phi-4, Llama-3 8B) — CPU inference, slower but functional
- Open WebUI as multi-user frontend, local embeddings via Ollama, ChromaDB as vector store
- n8n for workflows, Authentik or Keycloak for login
- Optionally a simple RAG index on your own Markdown documents
Best fit
Small teams (2–10 users), few parallel requests, use cases with German standard correspondence and classic RAG queries. First steps into self-hosting before investing in a GPU.
Effort & cost
Setup 3–6 days. Running costs around €30–80 / month (VPS rental + backups). Models and software are open source.
Tradeoff
CPU inference is significantly slower than GPU — answering a short question often takes 10–30 seconds instead of 1–3. Too slow for real-time use cases like live chat, very usable for asynchronous workflows (receipt classification, batch processing).