Group Duplicate Support Tickets
Your support platform receives tickets from users describing bugs and issues. The same problem often gets submitted multiple times — sometimes word for word, sometimes with minor whitespace or capitalization differences.
Implement group_duplicate_tickets(tickets) that identifies which tickets describe the same issue and groups them together.
Two tickets are duplicates if their title and body match after normalization: convert to lowercase, strip leading and trailing whitespace, and collapse any run of internal whitespace to a single space.
The category and user_id fields do not affect whether two tickets are duplicates — only title and body matter.
- Implement
group_duplicate_tickets(tickets: list[dict]) -> list[list[str]]inmain.py. - Two tickets are duplicates if their normalized
titleand normalizedbodyare both equal. Normalization: lowercase, strip leading/trailing whitespace, collapse internal whitespace runs to a single space. - Return a list of groups. Each group is a list of ticket IDs in their original order from the input.
- Only include groups of 2 or more tickets. Tickets with no duplicates must not appear in the output.
- The
categoryanduser_idfields must not influence duplicate detection. - An empty input returns an empty list.
tickets = [
{"id": "t1", "title": "Login broken", "category": "auth", "user_id": "u1", "body": "Cannot log in."},
{"id": "t2", "title": "App crashes", "category": "app", "user_id": "u2", "body": "Crashes on launch."},
{"id": "t3", "title": "Login broken", "category": "billing", "user_id": "u3", "body": "Cannot log in."},
][["t1", "t3"]]
t1 and t3 share the same normalized title and body. t2 is unique — it forms no group.
tickets = [
{"id": "t1", "title": "BUG: crash", "category": "app", "user_id": "u1", "body": "App crashes."},
{"id": "t2", "title": "bug: crash", "category": "app", "user_id": "u2", "body": "App crashes."},
][["t1", "t2"]]
After normalization both titles become "bug: crash" and both bodies become "app crashes." — they match.
tickets = [
{"id": "t1", "title": "Unique A", "category": "a", "user_id": "u1", "body": "Body A."},
{"id": "t2", "title": "Unique B", "category": "b", "user_id": "u2", "body": "Body B."},
][]
No two tickets share the same normalized content. Output is empty.
- Each ticket dict has keys:
id(str),title(str),category(str),user_id(str),body(str). - Ticket IDs are unique strings.
- Input list may be empty.
This solution is O(n) in time and space. How would you adapt it if tickets arrived in a stream instead of a batch — where you needed to report a new group as soon as it forms?