Workers
The workers section configures the async job engine that runs challenge validation and certificate issuance in the background.
Configuration
workers:
worker-id: "worker-1"
workers: 4
lease: 30s
idle-min: 50ms
idle-max: 200ms
base-backoff: 1s
max-backoff: 2m
queue-size: 4
drain-timeout: 30s
Fields
| Field | Default | Description |
|---|---|---|
worker-id | worker | Unique identifier for this worker instance. Useful in multi-node deployments. |
workers | 4 | Number of concurrent worker goroutines. |
lease | 30s | How long a worker holds a job lock. If processing takes longer, the lease is renewed automatically. |
idle-min | 50ms | Minimum polling interval when the queue is empty. |
idle-max | 200ms | Maximum polling interval when the queue is empty. Caps the empty-queue exponential backoff so the first job that arrives after a long quiet period is picked up within this delay. |
base-backoff | 1s | Initial backoff on job failure. |
max-backoff | 2m | Maximum backoff after repeated failures. |
queue-size | value of workers | In-memory job queue buffer size. |
drain-timeout | 30s | Maximum graceful-stop wait time for in-flight jobs before forced worker cancellation. Must be ≥ server.shutdown-timeout. See Graceful shutdown. |
How the Job Engine Works
All background work in Certeasy (DNS challenge validation, ADCS polling) is handled by the job engine:
- An ACME handler enqueues a job in the database
- A worker picks up the job and acquires a lease
- The worker executes the job handler (validate DNS, poll ADCS…)
- On success, the job is marked complete
- On transient failure, the job is rescheduled with exponential backoff
- On fatal failure, the job is failed and the associated order is invalidated
Jobs are persistent — if Certeasy restarts mid-processing, workers resume from the database.
Shutdown and Recovery
- On graceful stop (
SIGTERM), the dispatcher stops claiming new jobs, then workers drain in-flight jobs for up todrain-timeout. - If
drain-timeoutis exceeded, in-flight handlers are cancelled and process shutdown continues. - On force kill (
SIGKILL/kill -9), no graceful cleanup runs. In-flight jobs remain locked until their lease expires, then are picked again by workers after restart. - In practice, worst-case recovery delay after force kill is approximately
lease.
Tuning
The default settings (4 workers, 1s–2m backoff) work well for most deployments. Consider adjusting if:
- High certificate volume: increase
workersandqueue-size - Slow ADCS: increase
max-backoffandleaseto tolerate longer processing times - Multi-node: set a unique
worker-idper instance to distinguish workers in logs - Many idle instances against a shared database (HA): raise
idle-maxto1s–2sto reduce the steady-state read load on the shared database. The defaults are tuned for a single-instance deployment, where the per-poll cost is negligible and tight polling keeps certificate-issuance latency low.
Tuning Relationships
- Set
drain-timeoutto cover normal in-flight processing time during maintenance restarts. - Keep
leaselong enough to avoid premature reclaim during transient slowdowns, while still allowing acceptable post-crash recovery time. - In orchestrators, configure termination grace period to be greater than both
server.shutdown-timeoutandworkers.drain-timeout(plus margin).
Multi-node Deployments
Running multiple Certeasy instances against the same database is supported (PostgreSQL, SQL Server). Each instance competes for job leases — only one instance processes each job. Set worker-id to a unique value per instance:
# Node 1
workers:
worker-id: "worker-node1"
# Node 2
workers:
worker-id: "worker-node2"