Book a call
Tool in production

Paperless-ngx

The document archive for trade businesses and SMBs. on your own server, full-text search, automatic recognition of correspondent and document type — a concrete alternative to DocuWare and ELO for firms with 5–50 staff.

Project profile

Paperless-ngx

Document management system with OCR and full-text search

As of: June 1, 2026

GitHub stars

42k

Forks

2.8k

Open issues

7

License

GPL-3.0

Latest version

v2.20.15

Language

Python

First release
February 12, 2022
Last commit
June 1, 2026

Third-party source · Wikidata (CC0)

Wikidata profile

Paperless-ngx

Q134589265

License

GNU General Public License, version 3.0

What is Paperless-ngx?

Paperless-ngx is a DMS (Document Management System): documents arrive via mail, mobile scan or upload, are made searchable through (Tesseract), and are automatically classified by correspondent (supplier, authority) and document type (invoice, notice, contract).

The software is GPL-3.0 licensed — full open source. It is the active community fork of Paperless-ng (which itself was a fork of the original 'paperless'). The original projects went dormant — paperless-ngx has a very active maintainer community and a production-grade setup for SMBs.

Why a trade business uses Paperless-ngx

A typical HVAC business (heating, ventilation, plumbing) receives 200–400 delivery notes per month, 50–80 incoming invoices, 30–60 maintenance contracts, plus subsidy notices, F-gas certificates, leak-test protocols. Over the years 5,000–15,000 PDFs accumulate.

Without a DMS that means: a wall of binders, everyone hunts for 20 minutes when the tax inspector asks for an invoice from 2023. With Paperless-ngx: full-text search across every PDF, hits in under a second, GoBD-compliant retention. The binder wall becomes an accent wall.

Client case study

Schäfer Haustechnik

Family business in Lower Saxony, 12 people — master craftsmen Schäfer (Sr. + Jr.), 8 journeymen, 2 apprentices, 1 office. Around 280 delivery notes per month, 70 incoming invoices, 5 heat-pump applications per quarter. Running Paperless-ngx on their own server for 18 months — the last 7,400 documents are in the archive.

Full-text search across all documents

Tax inspector asks: 'Where is the invoice from the Viessmann delivery in March 2023?' Answer in 30 seconds instead of 30 minutes — every delivery note is OCR-captured and searchable.

GoBD-compliant archiving

Invoices, notices, contracts must be kept tamper-evident for 6–10 years. Paperless-ngx satisfies the German GoBD requirements with an , versioning and immutable originals.

Auto-tagging by correspondent

A delivery note from a plumbing wholesaler is automatically assigned the right correspondent, the 'Delivery note' tag fills the document-type field. The staff have to teach the system once — afterwards it learns by itself.

Data sovereignty (competition + audits)

Incoming invoices reveal purchase prices. Maintenance contracts name clients. None of this belongs to a US cloud vendor that may use the data for training or hand it over on subpoena.

Mobile scan on the building site

Delivery-note handover on site: the journeyman scans with their phone (Paperless Mobile or Scanbot), the PDF lands in the consume folder, + tagging run automatically. The original stays with the supplier.

10-year retention automatic

Tax retention periods are configured per document type. After expiry the system reminds you to delete (or keeps permanently, depending on policy). No more Excel sheet of 'this can go now'.

What the business actually does with it

Eight productive usage patterns from 18 months of Paperless-ngx in the HVAC firm's everyday. Each replaces an activity that used to take hours or was not possible at all.

Inbound mailbox consumed automatically

docs@schaefer-haustechnik.de is polled every minute by Paperless-ngx. Delivery notes, invoices, notices often arrive as PDF attachments — they land in the system automatically, get 'd and classified by correspondent. Manual sorting disappears completely.

Mobile scan via app

A journeyman receives a paper delivery note on site. App capture (Paperless Mobile via browser, or Scanbot with a ), PDF lands in the consume folder. Within 30 seconds the document is searchable.

Full-text search in under 1 second

Question: 'Where is the maintenance certificate for client Müller, plant no. 2024-007?' Search field → 'Müller 2024-007' → three hits with preview. Before: 20 minutes flipping through binders.

Tags + correspondent + document type

Three orthogonal classification axes: tag (freely chosen — '', 'Maintenance', 'Warranty'), correspondent (supplier, client, authority), document type (invoice, delivery note, notice). Filter combinations return exactly the right hits.

Automatic classification learns

After 30–50 example documents per correspondent, the system recognises new delivery notes from the same supplier on its own. Same for document types — incoming invoices get classified correctly without intervention.

Custom fields for HVAC specifics

Invoice amount, client number, plant ID as custom fields on the document. Reports: 'All invoices above €5,000 in Q1 2026' — set the filter, CSV export for accounting, done.

n8n workflow: invoice to DATEV

New incoming invoice in the DMS → → PDF + custom-field data to DATEV cloud. The accountant finds the invoice in the DATEV inbox, the original stays in Paperless. Two systems, one truth.

Backup routine with restore test

Weekly volume backup of the three containers, restore test once per quarter. GoBD requires recoverability — Paperless makes it easy because all data lives in two clearly scoped volumes (media + db).

Core capabilities of Paperless-ngx

What Paperless-ngx delivers technically — and which capabilities really carry an SMB setup.

OCR with Tesseract (local)

Tesseract runs in the container, no cloud . German + English language detection simultaneously. Handwritten notes are at least partially recognised — important for delivery notes with journeyman annotations.

Full-text search (PostgreSQL FTS)

full-text search across every text. Hits in under a second even at 10,000 documents. Search operators: AND, OR, NOT, fuzzy matching, phrase search.

Auto-tagging via machine learning

The classifier learns from the first 30–50 examples per class. After a short training phase new documents are classified automatically by correspondent, document type and even tags.

REST API for workflows

Full REST for every operation — upload, search, tagging, custom-field updates. Integration into workflows, DATEV bridges, custom code without trouble.

Mobile apps (third party)

Paperless Mobile (free PWA), Scanbot (commercial, very good auto-crop and multi-page scan), Genius Scan via . No official app from the Paperless team, but working ecosystem solutions.

GPL-3.0 — copyleft open source

Full GPL-3.0 licence: source code public, modifications must be published under the same licence when redistributed. Unproblematic for SMB own use — data stays the property of the business.

Honest alternatives

If Paperless-ngx is not a fit — what else?

Three alternatives with different strengths. The DMS market is broad — we show the comparisons that come up most often in real consulting calls.

DACH SaaS market leader

DocuWare

DocuWare GmbH, proprietary

  • + Very mature product, DACH market leader
  • + Deep DATEV and SAP integration
  • − From around €50/user/month, cumulative
  • − Cloud component, or on-premise with high effort

Enterprise DMS

ELO ECM

ELO Digital Office, proprietary

  • + Very powerful, full ECM range
  • + Office integration, workflows, reporting
  • − Four-figure annual licence costs
  • − Oversized for SMBs under 50 people

Analog

Wall of binders

Leitz, Edding, etc.

  • + Works without power and without configuration
  • + Nobody can hack the system
  • − Not searchable, no full-text
  • − Not GoBD-compliant for digital documents

Rule of thumb: an HVAC or trade business with 5–30 people is most pragmatic on Paperless-ngx — GoBD-compatible, local, GPL-3.0. Anyone already on a Microsoft stack may consider SharePoint DMS. DocuWare is the commercial reference but expensive. A wall of binders is also a solution — just not searchable.

Pricing

GPL-3.0. GoBD-compatible. Local.

License

GPL-3.0 — classic copyleft OSI open-source licence. Source code public, modifications stay under GPL when redistributed. For SMB own operation no obligations whatsoever. No per-user licence, no volume limit.

Running costs

Four containers on your own server: Paperless app, Redis, PostgreSQL, optionally Tika for Office files. RAM footprint around 500 MB for 5,000 documents. Storage: around 1 GB per 1,000 documents (PDFs with OCR layer). No external costs.

Effort

Docker Compose setup: 30 minutes. Initial configuration (mail account, first correspondents, document types): 1–2 hours. Complete trade-business setup with training batch, n8n-DATEV bridge and staff training: 2–4 consulting days.

Unlike Vaultwarden (AGPL-3.0) or (fair-code), Paperless-ngx has no commercial vendor in the background — no Business Edition, no Enterprise variant. What it has is a very active maintainer community with regular releases and professional documentation.

Mail inbox consumed automatically

# Paperless mail account (created in the web UI)
host: mail.hvac-firm.com
port: 993
username: docs@hvac-firm.com
password: ${MAIL_PASS}

# Mail rule
folder: INBOX
filter_subject: "Invoice|Delivery|Notice"
filter_body: ""
filter_from: ""
maximum_age: 7  # days

# Actions
action: move
action_parameter: "INBOX/Processed"
consumption_scope: attachments_only
assign_correspondent_from: from
assign_document_type_from: subject
assign_tags: ["Email-inbound", "Auto-import"]
The IMAP mailbox docs@hvac-firm.com is polled every minute. Attachments land in the consume folder and are processed automatically — OCR, classification, tagging. Source: docs.paperless-ngx.com.

Paperless-ngx stack as Docker Compose

services:
  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:2.20.15
    restart: unless-stopped
    depends_on: [db, redis]
    environment:
      - PAPERLESS_URL=https://paperless.hvac-firm.com
      - PAPERLESS_DBHOST=db
      - PAPERLESS_REDIS=redis://redis:6379
      - PAPERLESS_OCR_LANGUAGE=deu+eng
      - PAPERLESS_TIME_ZONE=Europe/Berlin
      - PAPERLESS_CONSUMPTION_DIR=/usr/src/paperless/consume
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./consume:/usr/src/paperless/consume
    networks: [frontend, paperless-net]

  db:
    image: postgres:16
    restart: unless-stopped
    environment:
      - POSTGRES_DB=paperless
      - POSTGRES_USER=paperless
      - POSTGRES_PASSWORD=${PAPERLESS_DB_PASS}
    volumes:
      - ./db:/var/lib/postgresql/data
    networks: [paperless-net]

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    networks: [paperless-net]

networks:
  frontend:
    external: true
  paperless-net:
Four containers for a productive DMS: app + Redis queue + PostgreSQL + Tesseract OCR worker. All volumes local, no cloud component. Source: docs.paperless-ngx.com, GPL-3.0.

Related topics

Paperless-ngx needs a platform and workflows

as the platform, Caddy as the HTTPS layer in front, for the DATEV bridge or mail trigger:

Ready for the next step?

Free intro call, no strings attached. In 30 minutes you'll know whether and how AI can help your business.

Book a callBAFA funding