Problem 36

Group Duplicate Support Tickets

MEDIUMBUILD

Hash Maps & Sets+2

Brief

Your support platform receives tickets from users describing bugs and issues. The same problem often gets submitted multiple times — sometimes word for word, sometimes with minor whitespace or capitalization differences.

Implement group_duplicate_tickets(tickets) that identifies which tickets describe the same issue and groups them together.

Two tickets are duplicates if their title and body match after normalization: convert to lowercase, strip leading and trailing whitespace, and collapse any run of internal whitespace to a single space.

The category and user_id fields do not affect whether two tickets are duplicates — only title and body matter.

Requirements

Implement group_duplicate_tickets(tickets: list[dict]) -> list[list[str]] in main.py.
Two tickets are duplicates if their normalized title and normalized body are both equal. Normalization: lowercase, strip leading/trailing whitespace, collapse internal whitespace runs to a single space.
Return a list of groups. Each group is a list of ticket IDs in their original order from the input.
Only include groups of 2 or more tickets. Tickets with no duplicates must not appear in the output.
The category and user_id fields must not influence duplicate detection.
An empty input returns an empty list.

Examples

Example 1

Input

tickets = [
  {"id": "t1", "title": "Login broken", "category": "auth", "user_id": "u1", "body": "Cannot log in."},
  {"id": "t2", "title": "App crashes",  "category": "app",  "user_id": "u2", "body": "Crashes on launch."},
  {"id": "t3", "title": "Login broken", "category": "billing", "user_id": "u3", "body": "Cannot log in."},
]

Output

[["t1", "t3"]]

Note

t1 and t3 share the same normalized title and body. t2 is unique — it forms no group.

Example 2

Input

tickets = [
  {"id": "t1", "title": "BUG: crash", "category": "app", "user_id": "u1", "body": "App  crashes."},
  {"id": "t2", "title": "bug: crash", "category": "app", "user_id": "u2", "body": "App crashes."},
]

Output

[["t1", "t2"]]

Note

After normalization both titles become "bug: crash" and both bodies become "app crashes." — they match.

Example 3

Input

tickets = [
  {"id": "t1", "title": "Unique A", "category": "a", "user_id": "u1", "body": "Body A."},
  {"id": "t2", "title": "Unique B", "category": "b", "user_id": "u2", "body": "Body B."},
]

Output

[]

Note

No two tickets share the same normalized content. Output is empty.

Constraints

Each ticket dict has keys: id (str), title (str), category (str), user_id (str), body (str).
Ticket IDs are unique strings.
Input list may be empty.

Judge checks

python environment
python-pytest runner
5000ms judge budget

Follow-up

This solution is O(n) in time and space. How would you adapt it if tickets arrived in a stream instead of a batch — where you needed to report a new group as soon as it forms?

Hints

Console output appears after the preview logs something.