aim/.plans/007-source-provider-expansion/2026-03-20-task-1-blocker-note.md

50 lines
No EOL
2 KiB
Markdown

# Task 1 Blocker Note
## Summary
Task 1 is blocked on an unresolved contract decision, not on a failing implementation.
The focused taxonomy test suite currently passes:
- `cargo test --package aim-core --test query_resolution`
But code review keeps surfacing the same underlying issue: some GitLab and SourceForge URL shapes are ambiguous when classified from URL structure alone.
## Current Blocker
Two classes of URLs cannot be classified with high confidence using only path-shape heuristics:
1. GitLab deep paths
- Example ambiguity: a path segment may be either a subgroup slug or a resource-like segment.
- This makes URLs such as deeply nested subgroup paths indistinguishable from some non-repository resource paths without provider-aware resolution.
2. SourceForge nested `files/.../download` paths
- Some are concrete file downloads.
- Some are folder-style or version-folder download endpoints.
- URL shape alone does not reliably distinguish the two in every case.
## Why This Blocks Task 1
Task 1 is supposed to harden source taxonomy. It is not supposed to solve provider semantics end to end.
At this point, the remaining disagreement is about where ambiguity should be resolved:
- in classification, using increasingly complex heuristics
- or later, in provider-aware resolution logic
Trying to solve this entirely in Task 1 has led to oscillation between:
- permissive rules that accept too many ambiguous URLs
- conservative rules that reject URLs a reviewer considers potentially valid
## Recommended Resolution
Freeze Task 1 on the current conservative contract and move ambiguity handling into Task 2.
That keeps Task 1 scoped to explicit supported forms and lets the resolver contract decide how to treat ambiguous provider URLs with richer semantics.
## Current Practical State
- taxonomy tests are green
- GitLab, SourceForge, and direct URL shapes are covered by focused tests
- ambiguous provider URL handling remains the only unresolved review topic for Task 1