Skip to main content
MediumAlgorithms & Data StructuresBuild

Group Duplicate Support Tickets

Your support platform receives tickets from users describing bugs and issues. The same problem often gets submitted multiple times — sometimes word for word, sometimes with minor whitespace or capitalization differences...

What you will practice

PythonHash Maps & SetsData TransformationGrouping

Requirements

  • Implement `group_duplicate_tickets(tickets: list[dict]) -> list[list[str]]` in `main.py`.
  • Two tickets are duplicates if their normalized `title` and normalized `body` are both equal. Normalization: lowercase, strip leading/trailing whitespace, collapse internal whitespace runs to a single space.
  • Return a list of groups. Each group is a list of ticket IDs in their original order from the input.
  • Only include groups of 2 or more tickets. Tickets with no duplicates must not appear in the output.
  • The `category` and `user_id` fields must not influence duplicate detection.
  • An empty input returns an empty list.

Starter files

main.pyEditable starter

What the judge checks

  • Runs in the python environment with the python-pytest runner.
  • Uses a 5000ms judge budget.
  • Behavior rules include: Groups Tickets With Matching Content, Case Insensitive Comparison, Whitespace Normalization, Singletons Excluded.

Constraints

  • Each ticket dict has keys: `id` (str), `title` (str), `category` (str), `user_id` (str), `body` (str).
  • Ticket IDs are unique strings.
  • Input list may be empty.

Example behavior

Input
tickets = [
  {"id": "t1", "title": "Login broken", "category": "auth", "user_id": "u1", "body": "Cannot log in."},
  {"id": "t2", "title": "App crashes",  "category": "app",  "user_id": "u2", "body": "Crashes on launch."},
  {"id": "t3", "title": "Login broken", "category": "billing", "user_id": "u3", "body": "Cannot log in."},
]
Output
[["t1", "t3"]]

t1 and t3 share the same normalized title and body. t2 is unique — it forms no group.

Input
tickets = [
  {"id": "t1", "title": "BUG: crash", "category": "app", "user_id": "u1", "body": "App  crashes."},
  {"id": "t2", "title": "bug: crash", "category": "app", "user_id": "u2", "body": "App crashes."},
]
Output
[["t1", "t2"]]

After normalization both titles become "bug: crash" and both bodies become "app crashes." — they match.

Follow-up

This solution is O(n) in time and space. How would you adapt it if tickets arrived in a stream instead of a batch — where you needed to report a new group as soon as it forms?