Build a Background Job Processor with Retries
You're building the worker that drains a job queue in a production backend. Jobs sometimes succeed, sometimes fail transiently (a flaky API, a timeout), and sometimes fail permanently (bad payload, expired credential).
Your job is to write the state machine that handles all three cases correctly: successes finalize, transient failures retry with a delay, permanent failures move to a dead-letter state immediately. Transient failures eventually exhaust if they hit the retry cap.
You're given two skeleton files: queue_store.py (in-memory job storage) and processor.py (the worker loop). The tests drive processor.process_next(now) to simulate time — no real sleeping required.
QueueStore.enqueue(job_type, payload)creates apendingjob withattempts=0andnext_attempt_at=0, and returns its id.QueueStore.next_runnable(now)returns the oldestpendingjob whosenext_attempt_at <= now, orNone. It must not return jobs insucceededordead_letterstate.- When a handler returns normally, the job transitions to
succeeded, attempts is incremented once, and the handler is not invoked again on laterprocess_nextcalls. - When a handler raises
PermanentError, the job transitions todead_letterimmediately — no matter how many retries remain — andlast_errorrecords the exception message. - When a handler raises any other exception, attempts increments and the job is rescheduled at
now + retry_delay_seconds. Aftermax_retriesattempts, the next failure puts the job indead_letter. processor.process_next(now)returnsTrueif a job was processed (success, retry-scheduled, or dead-lettered) andFalseif no job was runnable atnow.
store.enqueue('send_email', {'to': 'a@x.com'}); processor.register('send_email', lambda p: None); processor.process_next(now=0)Returns True. Job's state == 'succeeded', attempts == 1.
Handler raises TransientError on first call, succeeds on second. max_retries=3, retry_delay_seconds=5. process_next(now=0) → fails, schedules retry at t=5 process_next(now=4) → returns False (not yet runnable) process_next(now=5) → succeeds
Final state == 'succeeded', attempts == 2.
Handler raises PermanentError on first call. process_next(now=0)
Returns True. State == 'dead_letter', attempts == 1, last_error contains the exception message.
- Do not use real
time.sleepor wall-clock time inside the processor. The caller passesnowexplicitly so tests stay fast. PermanentErrormust skip the retry cap entirely. It is not a transient exception with a flag — it's a different control path.- A job that is currently in
dead_letterorsucceededmust never be re-run byprocess_next.
How would you add exponential backoff so retry #2 waits longer than retry #1? Where would the formula live so it's testable without time-based flakes?
Keep moving through related backend basics problems and build a stronger search-friendly practice loop around this topic.
View track →