productivity

Is Paperless-ngx Down? Real-Time Status & Outage Checker

Is Paperless-ngx Down? Real-Time Status & Outage Checker

Paperless-ngx is an open-source document management system with over 23,000 GitHub stars. A community fork of the original Paperless-ng project, it scans, OCRs, tags, and archives physical and digital documents into a searchable, organized library. Key features include OCR via Tesseract (supporting 100+ languages), automatic tagging and correspondent detection, full-text search, a REST API, and a clean web interface. It is self-hostable via Docker and uses PostgreSQL or SQLite for document metadata, Redis for the Celery task queue, and a consumption directory that watches for new files. Used by individuals and small businesses to digitize receipts, invoices, contracts, medical records, and tax documents — going fully paperless without giving data to cloud services.

Paperless-ngx relies on a multi-process architecture: the Django web server, Celery background workers, Redis task broker, and optionally a separate document consumption process must all be healthy simultaneously. A failure in any one layer stops document processing while the rest of the system may appear healthy from the outside.

Quick Status Check

#!/bin/bash
# Paperless-ngx health check
# Usage: bash check-paperless.sh [host] [port]

HOST="${1:-localhost}"
PORT="${2:-8000}"
BASE_URL="http://${HOST}:${PORT}"

echo "=== Paperless-ngx Health Check ==="
echo "Target: ${BASE_URL}"
echo ""

# 1. Check API root endpoint
echo "[1/5] Checking API endpoint..."
API=$(curl -sf --max-time 5 "${BASE_URL}/api/" 2>/dev/null)
if [ -n "${API}" ]; then
  echo "  OK  /api/ responded"
  VERSION=$(echo "${API}" | grep -o '"version":"[^"]*"' | head -1)
  [ -n "${VERSION}" ] && echo "       ${VERSION}"
else
  echo "  FAIL  /api/ unreachable — Paperless-ngx web server may be down"
fi

# 2. Check Celery worker processes
echo "[2/5] Checking Celery worker processes..."
if docker ps --format '{{.Names}}' 2>/dev/null | grep -qi "paperless.*worker\|celery"; then
  echo "  OK  Celery worker container detected"
elif pgrep -f "celery" > /dev/null 2>&1; then
  echo "  OK  Celery worker process is running"
else
  echo "  WARN  No Celery worker detected — document processing may be stalled"
fi

# 3. Check Redis
echo "[3/5] Checking Redis task queue..."
REDIS_HOST="${REDIS_HOST:-localhost}"
if nc -z -w3 "${REDIS_HOST}" 6379 2>/dev/null; then
  echo "  OK  Redis reachable at ${REDIS_HOST}:6379"
else
  echo "  FAIL  Redis not reachable — task queue broken, document consumption will stop"
fi

# 4. Check PostgreSQL or SQLite
echo "[4/5] Checking database..."
DB_HOST="${DB_HOST:-localhost}"
if nc -z -w3 "${DB_HOST}" 5432 2>/dev/null; then
  echo "  OK  PostgreSQL port 5432 open at ${DB_HOST}"
else
  echo "  INFO  PostgreSQL not detected on ${DB_HOST}:5432 (may be SQLite)"
fi

# 5. Check consumption directory is accessible
echo "[5/5] Checking consumption directory..."
CONSUME_DIR="${CONSUME_DIR:-/consume}"
if [ -d "${CONSUME_DIR}" ] && [ -r "${CONSUME_DIR}" ]; then
  PENDING=$(ls "${CONSUME_DIR}" 2>/dev/null | wc -l | tr -d ' ')
  echo "  OK  Consumption directory accessible: ${PENDING} file(s) pending"
else
  echo "  WARN  Consumption directory '${CONSUME_DIR}' not accessible — new scans may be ignored"
fi

echo ""
echo "=== Check complete ==="

Python Health Check

#!/usr/bin/env python3
"""
Paperless-ngx health check
Verifies web server, document count, tag count, task queue health,
storage statistics, and detection of stuck Celery tasks.
"""

import sys
import json
import time
import urllib.request
import urllib.error
from datetime import datetime, timezone

BASE_URL = "http://localhost:8000"
# Set these to authenticate (token or username:password via session)
API_TOKEN = ""   # set PAPERLESS_TOKEN env var or fill in here
TIMEOUT = 10
TASK_STUCK_MINUTES = 60  # warn if a task has been PENDING longer than this


def fetch(url, token=""):
    headers = {"Accept": "application/json"}
    if token:
        headers["Authorization"] = f"Token {token}"
    try:
        req = urllib.request.Request(url, headers=headers)
        with urllib.request.urlopen(req, timeout=TIMEOUT) as resp:
            return json.loads(resp.read().decode())
    except urllib.error.HTTPError as e:
        body = e.read().decode(errors="ignore") if e.fp else ""
        return {"_error": f"HTTP {e.code}", "_body": body[:200]}
    except Exception as e:
        return {"_error": str(e)}


import os
token = API_TOKEN or os.environ.get("PAPERLESS_TOKEN", "")

results = []
print("=== Paperless-ngx Health Check ===")
print(f"Target: {BASE_URL}\n")

# 1. API root — version and basic availability
print("[1/5] API availability & version...")
r = fetch(f"{BASE_URL}/api/", token)
if "_error" in r:
    print(f"  [FAIL] /api/: {r['_error']}")
    results.append(False)
else:
    version = r.get("version", "unknown")
    corresp_url = r.get("correspondents", "")
    print(f"  [OK  ] API available | version: {version}")
    results.append(True)

# 2. Document count (requires auth)
print("[2/5] Document library...")
if token:
    r = fetch(f"{BASE_URL}/api/documents/?page_size=1", token)
    if "_error" in r:
        print(f"  [FAIL] /api/documents/: {r['_error']}")
        results.append(False)
    else:
        doc_count = r.get("count", 0)
        print(f"  [OK  ] Document count: {doc_count:,}")
        results.append(True)

    # 3. Tag count
    print("[3/5] Tags & correspondents...")
    r_tags = fetch(f"{BASE_URL}/api/tags/?page_size=1", token)
    r_corr = fetch(f"{BASE_URL}/api/correspondents/?page_size=1", token)
    tag_count = r_tags.get("count", 0) if "_error" not in r_tags else "?"
    corr_count = r_corr.get("count", 0) if "_error" not in r_corr else "?"
    ok = "_error" not in r_tags and "_error" not in r_corr
    level = "OK  " if ok else "FAIL"
    print(f"  [{level}] Tags: {tag_count} | Correspondents: {corr_count}")
    results.append(ok)

    # 4. Task queue — check for stuck PENDING tasks
    print("[4/5] Celery task queue...")
    r = fetch(f"{BASE_URL}/api/tasks/?page_size=50", token)
    if "_error" in r:
        print(f"  [WARN] /api/tasks/ not available: {r['_error']}")
        results.append(True)  # non-fatal if endpoint not present
    else:
        tasks = r.get("results", r) if isinstance(r, dict) else r
        if not isinstance(tasks, list):
            tasks = []
        now = datetime.now(timezone.utc)
        stuck = []
        for t in tasks:
            status = t.get("status", "")
            created = t.get("date_created", "")
            if status == "PENDING" and created:
                try:
                    created_dt = datetime.fromisoformat(created.replace("Z", "+00:00"))
                    age_min = (now - created_dt).total_seconds() / 60
                    if age_min > TASK_STUCK_MINUTES:
                        stuck.append((t.get("task_file_name", "unknown"), int(age_min)))
                except Exception:
                    pass
        if stuck:
            print(f"  [WARN] {len(stuck)} task(s) stuck PENDING > {TASK_STUCK_MINUTES} min:")
            for name, age in stuck[:3]:
                print(f"       '{name}' stuck for {age} min")
            results.append(False)
        else:
            total_tasks = len(tasks)
            print(f"  [OK  ] {total_tasks} recent task(s) — no stuck tasks detected")
            results.append(True)

    # 5. Storage statistics
    print("[5/5] Storage statistics...")
    r = fetch(f"{BASE_URL}/api/statistics/", token)
    if "_error" in r:
        print(f"  [WARN] /api/statistics/ unavailable: {r['_error']}")
        results.append(True)
    else:
        doc_count_stat = r.get("documents_total", 0)
        total_size = r.get("total_file_size", 0)
        size_gb = total_size / (1024 ** 3) if total_size else 0
        inbox = r.get("inbox_count", 0)
        print(f"  [OK  ] Total documents: {doc_count_stat:,} | "
              f"Storage: {size_gb:.2f} GB | Inbox (untagged): {inbox}")
        results.append(True)
else:
    # Unauthenticated: just check that API and UI respond
    print("[2/5] Document library (no token — skipping)...")
    print("  [INFO] Set PAPERLESS_TOKEN to enable authenticated checks")
    print("[3/5] Tags (skipped — no token)...")
    print("[4/5] Task queue (skipped — no token)...")
    print("[5/5] Statistics (skipped — no token)...")
    results.extend([True, True, True, True])

# Summary
passed = sum(results)
total = len(results)
print(f"\n=== Summary: {passed}/{total} checks passed ===")
if passed < total:
    print("Action required: review FAIL/WARN items above.")
    sys.exit(1)
else:
    print("Paperless-ngx appears healthy.")
    sys.exit(0)

Common Paperless-ngx Outage Causes

SymptomLikely CauseResolution
New files in consumption directory never appear in the library Celery worker not running — document processing tasks not being consumed from queue Check docker compose ps for the worker container; restart it; verify Redis connectivity from worker
Task queue fills up; documents stuck in PENDING indefinitely Redis unavailable — broker down, OOM killed, or network issue between containers Restart Redis container; verify CELERY_BROKER_URL env var points to correct Redis host
Documents imported successfully but have no searchable text; OCR field blank Tesseract not installed in the container or OCR_LANGUAGE pack missing Verify Tesseract is installed (tesseract --version in container); install missing language packs; re-run OCR on affected documents
Web UI loads but shows empty library; "0 documents" despite prior imports PostgreSQL connection lost — database container stopped or credentials changed Verify PostgreSQL container is running; check PAPERLESS_DBHOST and credentials; review Django logs for DB connection errors
Scanner uploads or watched folder drops new files but they are never processed Consumption directory permissions wrong — Paperless process cannot read new files Verify directory ownership matches the USERMAP_UID/USERMAP_GID env vars; fix with chown -R
Search returns no results or incorrect results for known document content Full-text search index needs rebuild — index corrupt or out of sync after migration Run document_index reindex management command in the web container; allow time for reindexing to complete

Architecture Overview

ComponentFunctionFailure Impact
Django web server (gunicorn) REST API, web UI, document search and retrieval Complete loss of UI and API access; documents safe but inaccessible
Celery workers Background document processing: OCR, tagging, thumbnail generation, consumption New documents not processed; existing library intact but no new ingestion
Redis Task queue broker; Celery workers pull jobs from Redis All background tasks stop; documents queue in filesystem but never process
PostgreSQL / SQLite Document metadata, tags, correspondents, user accounts, search index UI shows empty library; all metadata and search unavailable
Tesseract OCR Extracts text from scanned PDFs and images for full-text indexing Imported documents have no text; search cannot find document contents
Consumption directory Watched filesystem path; new files dropped here trigger Celery processing jobs Scanner or watched folder uploads silently ignored; no import errors shown

Uptime History

DateIncident TypeDurationImpact
Jan 2026 Redis OOM killed by host system; Celery queue broken 2–6 hrs (until Redis restarted and queue drained) No new documents processed; consumption directory backlog accumulated
Oct 2025 Docker volume permissions changed after host OS update; consumption dir unreadable 1–5 hrs (until permissions fixed) Scanner uploads silently dropped; no error shown to user; documents lost from intake
Aug 2025 PostgreSQL container ran out of disk space; DB writes failed 30 min–3 hrs New document metadata not saved; UI showed errors on document open; required DB recovery
Jul 2025 Tesseract language pack missing after container image update Variable — until discovered and fixed All newly imported documents had no OCR text; full-text search returned no results for new imports

Monitor Paperless-ngx Automatically

Paperless-ngx failures are easy to miss — the web UI may appear healthy while the Celery workers are silently not processing documents, and a permissions issue in the consumption directory causes new scans to be silently dropped with no error message. ezmon.com monitors your Paperless-ngx endpoints from multiple external probes and alerts your team via Slack, PagerDuty, or SMS the moment the API stops responding or your task queue shows signs of stalling.

Set up Paperless-ngx monitoring free at ezmon.com →

paperless-ngxdocument-managementocrself-hostedproductivitystatus-checker