Problem 30

Build a Background Job Processor with Retries

HARDBUILD

Queues / Background Jobs+4

Brief

You're building the worker that drains a job queue in a production backend. Jobs sometimes succeed, sometimes fail transiently (a flaky API, a timeout), and sometimes fail permanently (bad payload, expired credential).

Your job is to write the state machine that handles all three cases correctly: successes finalize, transient failures retry with a delay, permanent failures move to a dead-letter state immediately. Transient failures eventually exhaust if they hit the retry cap.

You're given two skeleton files: queue_store.py (in-memory job storage) and processor.py (the worker loop). The tests drive processor.process_next(now) to simulate time — no real sleeping required.

Requirements

QueueStore.enqueue(job_type, payload) creates a pending job with attempts=0 and next_attempt_at=0, and returns its id.
QueueStore.next_runnable(now) returns the oldest pending job whose next_attempt_at <= now, or None. It must not return jobs in succeeded or dead_letter state.
When a handler returns normally, the job transitions to succeeded, attempts is incremented once, and the handler is not invoked again on later process_next calls.
When a handler raises PermanentError, the job transitions to dead_letter immediately — no matter how many retries remain — and last_error records the exception message.
When a handler raises any other exception, attempts increments and the job is rescheduled at now + retry_delay_seconds. After max_retries attempts, the next failure puts the job in dead_letter.
processor.process_next(now) returns True if a job was processed (success, retry-scheduled, or dead-lettered) and False if no job was runnable at now.

Examples

Example 1

Input

store.enqueue('send_email', {'to': 'a@x.com'}); processor.register('send_email', lambda p: None); processor.process_next(now=0)

Output

Returns True. Job's state == 'succeeded', attempts == 1.

Example 2

Input

Handler raises TransientError on first call, succeeds on second. max_retries=3, retry_delay_seconds=5.
process_next(now=0) → fails, schedules retry at t=5
process_next(now=4) → returns False (not yet runnable)
process_next(now=5) → succeeds

Output

Final state == 'succeeded', attempts == 2.

Example 3

Input

Handler raises PermanentError on first call.
process_next(now=0)

Output

Returns True. State == 'dead_letter', attempts == 1, last_error contains the exception message.

Constraints

Do not use real time.sleep or wall-clock time inside the processor. The caller passes now explicitly so tests stay fast.
PermanentError must skip the retry cap entirely. It is not a transient exception with a flag — it's a different control path.
A job that is currently in dead_letter or succeeded must never be re-run by process_next.

Judge checks

python environment
python-pytest runner
8000ms judge budget

Follow-up

How would you add exponential backoff so retry #2 waits longer than retry #1? Where would the formula live so it's testable without time-based flakes?

Hints

Related Practice

Track

Backend Basics

Keep moving through related backend basics problems and build a stronger search-friendly practice loop around this topic.

View track →

Problem 23

Build a Basic Health Check Route

easy

Problem 3

Password Strength Tagger

easy

Problem 20

Refactor Duplicate Validation

medium

Console output appears after the preview logs something.