Skip to main content
Problem 36

Group Duplicate Support Tickets

MEDIUMBUILD
Hash Maps & Sets+2

Your support platform receives tickets from users describing bugs and issues. The same problem often gets submitted multiple times — sometimes word for word, sometimes with minor whitespace or capitalization differences.

Implement group_duplicate_tickets(tickets) that identifies which tickets describe the same issue and groups them together.

Two tickets are duplicates if their title and body match after normalization: convert to lowercase, strip leading and trailing whitespace, and collapse any run of internal whitespace to a single space.

The category and user_id fields do not affect whether two tickets are duplicates — only title and body matter.

Requirements
  • Implement group_duplicate_tickets(tickets: list[dict]) -> list[list[str]] in main.py.
  • Two tickets are duplicates if their normalized title and normalized body are both equal. Normalization: lowercase, strip leading/trailing whitespace, collapse internal whitespace runs to a single space.
  • Return a list of groups. Each group is a list of ticket IDs in their original order from the input.
  • Only include groups of 2 or more tickets. Tickets with no duplicates must not appear in the output.
  • The category and user_id fields must not influence duplicate detection.
  • An empty input returns an empty list.
Examples
Example 1
Input
tickets = [
  {"id": "t1", "title": "Login broken", "category": "auth", "user_id": "u1", "body": "Cannot log in."},
  {"id": "t2", "title": "App crashes",  "category": "app",  "user_id": "u2", "body": "Crashes on launch."},
  {"id": "t3", "title": "Login broken", "category": "billing", "user_id": "u3", "body": "Cannot log in."},
]
Output
[["t1", "t3"]]
Note

t1 and t3 share the same normalized title and body. t2 is unique — it forms no group.

Example 2
Input
tickets = [
  {"id": "t1", "title": "BUG: crash", "category": "app", "user_id": "u1", "body": "App  crashes."},
  {"id": "t2", "title": "bug: crash", "category": "app", "user_id": "u2", "body": "App crashes."},
]
Output
[["t1", "t2"]]
Note

After normalization both titles become "bug: crash" and both bodies become "app crashes." — they match.

Example 3
Input
tickets = [
  {"id": "t1", "title": "Unique A", "category": "a", "user_id": "u1", "body": "Body A."},
  {"id": "t2", "title": "Unique B", "category": "b", "user_id": "u2", "body": "Body B."},
]
Output
[]
Note

No two tickets share the same normalized content. Output is empty.

Constraints
  • Each ticket dict has keys: id (str), title (str), category (str), user_id (str), body (str).
  • Ticket IDs are unique strings.
  • Input list may be empty.
Follow-up

This solution is O(n) in time and space. How would you adapt it if tickets arrived in a stream instead of a batch — where you needed to report a new group as soon as it forms?

Hints
Console output will appear here...