Prometheus
64k ★
Time-series database
Scrapes metrics every 15 seconds, stores them in its own TSDB, answers queries via PromQL. The heart of the stack.
Seven containers that together form a complete stack: collect metrics, aggregate logs, build dashboards, dispatch alerts. A concrete alternative to Datadog and New Relic for hosting providers and IT teams that want to keep their own eyes on their infrastructure.
Compose for the central observability host
services:
prometheus:
image: prom/prometheus:v3.5.0
container_name: prometheus
restart: unless-stopped
ports: ["9090:9090"]
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.retention.time=30d
networks: [observability]
grafana:
image: grafana/grafana:11.5.0
container_name: grafana
restart: unless-stopped
ports: ["3000:3000"]
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SERVER_ROOT_URL=https://obs.hoster.com
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASS}
networks: [observability]
loki:
image: grafana/loki:3.4.2
container_name: loki
restart: unless-stopped
ports: ["3100:3100"]
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
- loki_data:/loki
command: -config.file=/etc/loki/local-config.yaml
networks: [observability]
alertmanager:
image: prom/alertmanager:v0.28.1
container_name: alertmanager
restart: unless-stopped
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
networks: [observability]
volumes:
prometheus_data:
grafana_data:
loki_data:
networks:
observability:Each component does one thing well. Together they form a stack that runs fully on your own server with no SaaS dependency. All seven are real open source.
64k ★
Time-series database
Scrapes metrics every 15 seconds, stores them in its own TSDB, answers queries via PromQL. The heart of the stack.
74k ★
Dashboards + alerting
Front end for and . Dashboards for hosts, containers, customers. Alertmanager integration for Slack/email.
28k ★
Log aggregation
Like for logs: no full-text index, just labels. Very resource-efficient. Queries via LogQL.
28k ★
Log shipper on every host
Collects logs from journald, containers, file paths and sends them to . Installed on every monitored host.
19k ★
Container metrics
Reads CPU, RAM, network metrics per container. Exports them in format. One instance runs on every container host.
13k ★
Host metrics
Measures CPU load, RAM, disk IO, network counters, system load on the host itself. On every monitored host as a service or container.
79k ★
Realtime monitoring (optional)
All-in-one realtime monitor with its own UI. Complementary to + — for one-second reaction on acute issues.
The seven components work in a clear data flow: node_exporter and collect metrics on each monitored host and serve them on port 9100/8080. scrapes those endpoints every 15 seconds and stores them in a time-series DB. Promtail collects logs from containers and journald and sends them to for aggregation. shows both — metrics and logs — in dashboards with alerting.
The result: one UI for every host, every container, every service. Anyone who wants to know whether a customer's webshop is still up, whether DB load is high, whether a specific error appears in the log — looks at , not at 18 separate SSH sessions. Datadog delivers the same, but costs around €5,000/year at 18 hosts — the stack is one-off setup plus one .
A small hosting provider with 15–25 client on its own hardware has two problems without central monitoring: first, every piece of information comes from the customer — 'my shop is down', 'the server reacts slowly', 'my mail does not come through'. Reactive. Second, if you want to clarify on your own, it takes 18 SSH sessions, htop, journalctl. Does not scale.
An stack flips both: you see in that the disk on VPS-12 is 91 % full — before the customer files a ticket. Alerts go via Slack. Logs are searchable across every host at once. Datadog delivers the same — but for a 6-person hoster €5,000/year of licence costs make a big difference.
Client case study
Small managed-services provider in Lower Saxony, 6 people, 18 client on their own Hetzner servers. 8 months ago migrated from 'everyone SSHs individually' to a central stack. Today: one dashboard for everything, Slack alerts for every disk/memory/service issue, 30 days of metrics history for post-mortems.
Concrete setups Schmidt-Werlich uses daily for 8 months. Each pattern replaces either a reactive activity or a gap that would have stayed invisible without central monitoring.
Six stack-level capabilities — properties that only emerge from the interplay of the components.
Example Prometheus alert rule
# /etc/prometheus/alerts/disk.yml
groups:
- name: disk
interval: 30s
rules:
- alert: DiskUsageHigh
expr: |
(
(node_filesystem_size_bytes - node_filesystem_avail_bytes)
/ node_filesystem_size_bytes
) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Disk on {{ $labels.instance }} > 85% full"
description: |
Mountpoint {{ $labels.mountpoint }} on host
{{ $labels.instance }} is {{ $value | printf \"%.1f\" }}%
full. Maintenance required.
# → Alertmanager → Slack #hosting-alertsHonest alternatives
Three alternatives with different strengths. The stack is the pragmatic default for SMBs and small hosting providers.
SaaS market leader
Datadog Inc., USA
SaaS with free tier
New Relic Inc., USA
Classic self-hosted
Zabbix LLC, GPL-2.0
Rule of thumb: anyone with 5–100 hosts, IT-affine staff and consulting support is pragmatic on the stack. Datadog/New Relic scale only via the wallet — at 50+ hosts they get noticeably more expensive than an extra mini PC. Zabbix remains an option for pure infrastructure monitoring without container depth.
Pricing
License
All 7 components open source: Prometheus + cAdvisor + node_exporter under Apache-2.0, Grafana + Loki + Promtail under AGPL-3.0, Netdata under GPL-3.0. For own use without redistribution no obligations.
Running costs
One additional observability host: VPS with 4–8 GB RAM, 100 GB storage (Hetzner CPX31 from €15/month). Plus minimal overhead on every monitored host (node_exporter + cAdvisor + Promtail = around 50 MB RAM/host). At 18 hosts: 0.9 GB RAM extra in total.
Effort
Initial setup with all 7 components + first 5 hosts: 2–3 days. Roll-out to additional hosts: 30 minutes per host. Dashboard build for a hosting setup (multi-tenant, customer dashboards, alerts): 2 consulting days.
Datadog for 18 hosts: around €5,000/year. New Relic free tier covers about 10 hosts. stack: one-off setup (5–8 consulting days) + €15/month . Break-even against Datadog for hosting providers typically after 2–4 months.
Related topics
does external checks (HTTPS, DNS), does container inspection, the stack delivers the infrastructure view:
Free intro call, no strings attached. In 30 minutes you'll know whether and how AI can help your business.