================================================================================
INPUT_OUTPUT_DEPENDENCY_MAP_5_12_26.txt
CollabORhythm / Collabtunes — Engineering Blueprint Phase
Generated: 5.12.26 | Black Claude — Blueprint Session
PURPOSE: Every input, output, and dependency for both SGC codes
STATUS: ENGINEERING PREP — blueprint phase
================================================================================

================================================================================
SECTION 1 — SGC-1 INPUT DEPENDENCIES
================================================================================

DEPENDENCY     | FILE / SOURCE                                      | REQUIRED? | PHASE NEEDED
---------------|----------------------------------------------------|-----------|--------------
SEED_URLS      | MASTER_URL_AUTHORITY_REGISTRY_5_12_26.txt          | REQUIRED  | Phase 0
NAV_STRUCTURE  | FINAL_NAVIGATION_AUTHORITY_MAP_5_12_26.txt          | REQUIRED  | Phase 0/3
RATING_DATA    | MASTER_CONTENT_RATINGS_INDEX_VOL3_LIVE_SITE        | REQUIRED  | Phase 5
CANON_DATA     | FINAL_CANON_AUTHORITY_REGISTRY_5_12_26.txt          | REQUIRED  | Phase 5
BLOCKER_DATA   | UNRESOLVED_BLOCKERS_REQUIRING_TOM_DECISIONS         | OPTIONAL  | Phase 5/4
PRIOR_RUN      | SGC1_LIVE_SITE_SNAPSHOT_[PRIOR_DATE].json          | OPTIONAL  | Phase 0 (delta)
LIVE_SITE      | https://collabtunes.com (network)                   | REQUIRED  | Phase 2/4
NETWORK        | HTTP/HTTPS access to collabtunes.com                | REQUIRED  | Phase 2/4

PYTHON LIBRARIES REQUIRED:
  requests         → HTTP HEAD and GET requests
  beautifulsoup4   → HTML body parsing
  lxml             → BeautifulSoup parser backend
  json             → output serialization
  csv              → optional CSV export
  time             → rate limiting (time.sleep)
  hashlib          → URL fingerprinting for dedupe
  collections.deque → crawl queue
  urllib.parse     → URL normalization
  os               → file system operations

================================================================================
SECTION 2 — SGC-1 OUTPUT FILES
================================================================================

OUTPUT FILE                                              | FORMAT | DESTINATION
---------------------------------------------------------|--------|------------
SGC1_LIVE_SITE_SNAPSHOT_[DATE]_[TIME].json               | JSON   | /outputs/
SGC1_LIVE_SITE_SNAPSHOT_[DATE]_[TIME]_SUMMARY.txt        | TXT    | /outputs/
SGC1_RUN_MANIFEST_[DATE]_[TIME].txt                      | TXT    | /outputs/
SGC1_CHECKPOINT_[DATE]_[TIME]_START.json                 | JSON   | /logs/
SGC1_CHECKPOINT_[DATE]_[TIME]_PHASE2_COMPLETE.json       | JSON   | /logs/
SGC1_CHECKPOINT_[DATE]_[TIME]_PHASE4_COMPLETE.json       | JSON   | /logs/
SGC1_CHECKPOINT_[DATE]_[TIME]_COMPLETE.json              | JSON   | /logs/

OUTPUTS NEVER CREATED BY SGC-1 (explicit prohibition):
  Any file at collabtunes.com
  Any modification to seed files
  Any file outside /outputs/ or /logs/
  Any file with generic name (final.json, output.txt, etc.)

================================================================================
SECTION 3 — SGC-2 INPUT DEPENDENCIES
================================================================================

DEPENDENCY        | FILE / SOURCE                                    | REQUIRED? | PHASE NEEDED
------------------|--------------------------------------------------|-----------|--------------
REPO_ROOT         | COLLABTUNES_PROJECT_ROOT (filesystem)            | REQUIRED  | Phase 1
NAMING_RULES      | MASTER_NAMING_STANDARD_5_12_26.txt               | REQUIRED  | Phase 2
REPO_PLAN         | REPOSITORY_STRUCTURE_PLAN_5_12_26.txt            | REQUIRED  | Phase 5
DEPENDENCY_MAP    | CROSS_SYSTEM_DEPENDENCY_MAP_5_12_26.txt          | REQUIRED  | Phase 6
GEN_READINESS     | GENERATOR_INPUT_READINESS_REPORT_5_12_26.txt     | REQUIRED  | Phase 6
CATEGORY_CODES    | COLLABTUNES_OUTPUT_NAMING_RULES_PERMANENT        | REQUIRED  | Phase 2
ZIP_MANIFEST      | Any MANIFEST file inside a ZIP                   | OPTIONAL  | Phase 3
PRIOR_RUN         | SGC2_REPO_INVENTORY_[PRIOR_DATE].json            | OPTIONAL  | Phase 0 (delta)

SENSITIVE FILES — NEVER OPENED (catalog path only):
  DEFAMATION_RISK_REGISTRY_VOL1.txt
  CREATOR_INTERVIEW_TRANSCRIPT*.txt
  MASTER_DUMPS*.zip
  Any file matching MASTER_DUMPS pattern

PYTHON LIBRARIES REQUIRED:
  os              → directory walk
  os.path         → path operations
  zipfile         → ZIP inspection (read mode only)
  json            → output serialization
  re              → regex for filename parsing
  hashlib         → file fingerprinting
  datetime        → timestamp handling
  collections     → defaultdict for grouping
  stat            → file mtime reading

================================================================================
SECTION 4 — SGC-2 OUTPUT FILES
================================================================================

OUTPUT FILE                                              | FORMAT | DESTINATION
---------------------------------------------------------|--------|------------
SGC2_REPO_INVENTORY_[DATE]_[TIME].json                   | JSON   | /outputs/
SGC2_REPO_INVENTORY_[DATE]_[TIME]_SUMMARY.txt            | TXT    | /outputs/
SGC2_RUN_MANIFEST_[DATE]_[TIME].txt                      | TXT    | /outputs/
SGC2_CHECKPOINT_[DATE]_[TIME]_START.json                 | JSON   | /logs/
SGC2_CHECKPOINT_[DATE]_[TIME]_PHASE1_COMPLETE.json       | JSON   | /logs/
SGC2_CHECKPOINT_[DATE]_[TIME]_PHASE3_COMPLETE.json       | JSON   | /logs/
SGC2_CHECKPOINT_[DATE]_[TIME]_PHASE6_COMPLETE.json       | JSON   | /logs/
SGC2_CHECKPOINT_[DATE]_[TIME]_COMPLETE.json              | JSON   | /logs/

OUTPUTS NEVER CREATED BY SGC-2:
  Any modification to source files
  Any file outside /outputs/ or /logs/
  Any extraction of SENSITIVE file content
  Any file with generic name

================================================================================
SECTION 5 — DEPENDENCY BETWEEN SGC-1 AND SGC-2
================================================================================

INDEPENDENT: SGC-1 and SGC-2 can run independently. Neither requires the
             other's output to function.

COMBINED USE CASE:
  SGC-2 runs first → produces REPO_INVENTORY
  SGC-1 runs second → uses REPO_INVENTORY to know which seed files are AUTHORITATIVE
  Combined output → Production Claude gets full dual-sided picture

FEEDING SGC-1 FROM SGC-2 OUTPUT:
  SGC-2 output identifies the most current (AUTHORITATIVE) versions of:
    MASTER_URL_AUTHORITY_REGISTRY → SGC-1 uses as seed
    FINAL_NAVIGATION_AUTHORITY_MAP → SGC-1 uses as nav reference
    MASTER_CONTENT_RATINGS_INDEX → SGC-1 uses for rating assignment
  SGC-1 --seed-file arg can accept the path identified by SGC-2

CONFLICT CROSS-REFERENCE:
  SGC-1 conflict IDs begin with: CC-LIVE-[N]
  SGC-2 conflict IDs begin with: CC-REPO-[N]
  Known blocker IDs (from UNRESOLVED_BLOCKERS) should be linked:
    CC-LIVE-001 → maps to BLOCK-H04 (Lady Weaver duplicate URL)
    CC-LIVE-002 → maps to BLOCK-H01 (Chapter drift)
    CC-LIVE-003 → maps to BLOCK-M01 (YouTube URL collision)

================================================================================
SECTION 6 — WHAT EACH CODE PRODUCES FOR PRODUCTION CLAUDE
================================================================================

SGC-1 GIVES PRODUCTION CLAUDE:
  ✅ Complete map of what is actually live on the site RIGHT NOW
  ✅ HTTP status of every URL (200 / 404 / redirect)
  ✅ Which nav sources include each page
  ✅ Which pages lack rating gates (GATE_MISSING flags)
  ✅ All discovered chapter drift pages with exact label/URL mismatch
  ✅ All URL conflicts with canonical designations where known
  ✅ All orphaned pages (live but not in nav)
  ✅ All JS-rendered pages needing manual capture
  ✅ Sorted flag list (CRITICAL → HIGH → MEDIUM → LOW)

SGC-2 GIVES PRODUCTION CLAUDE:
  ✅ Complete inventory of all project files — no file is unknown
  ✅ Authority level for every file (AUTHORITATIVE / REFERENCE / DEPRECATED)
  ✅ Version chains — which file is the current version of each document
  ✅ Naming violations — files that break the permanent naming rules
  ✅ Missing manifests — ZIPs without manifest files
  ✅ Generator input readiness — what is ready vs missing for AIO generation
  ✅ Dependency chain status — what is COMPLETE / MISSING / BLOCKED
  ✅ Folder structure compliance — what is in the right place vs misplaced
  ✅ Sensitive files inventoried (paths only — never content)

COMBINED:
  Production Claude no longer needs to manually read 28+ files.
  It receives two structured data objects and can query them directly.
  Session setup time: dramatically reduced.
  Risk of using outdated or wrong source file: eliminated.

================================================================================
SECTION 7 — DATA FLOW DIAGRAM (ASCII)
================================================================================

  [ collabtunes.com LIVE SITE ]          [ COLLABTUNES_PROJECT_ROOT ]
           |                                         |
           v                                         v
  [ SGC-1: LIVE SITE GATHERER ]         [ SGC-2: REPO GATHERER ]
    Phases 0-8                            Phases 0-9
           |                                         |
           v                                         v
  SGC1_LIVE_SITE_SNAPSHOT.json          SGC2_REPO_INVENTORY.json
  SGC1_LIVE_SITE_SNAPSHOT.txt           SGC2_REPO_INVENTORY.txt
  SGC1_RUN_MANIFEST.txt                 SGC2_RUN_MANIFEST.txt
  [checkpoint files in /logs/]          [checkpoint files in /logs/]
           |                                         |
           +-------------------+---------------------+
                               |
                               v
                    [ PRODUCTION CLAUDE ]
              Receives structured dual-sided picture:
              - What is live (SGC-1 output)
              - What is in the repo (SGC-2 output)
              Executes work without manual re-reading of 28 files.

================================================================================
SECTION 8 — FAILURE MODES AND WHAT THEY AFFECT
================================================================================

FAILURE                          | AFFECTED CODE | FALLBACK
---------------------------------|---------------|----------------------------------
Network unreachable              | SGC-1         | Abort gracefully, write ERROR to manifest
Seed file not found              | SGC-1         | Abort with MISSING_DEPENDENCY error
Repo root not found              | SGC-2         | Abort with MISSING_DEPENDENCY error
ZIP corrupt / unreadable         | SGC-2         | Skip ZIP, flag CORRUPT_ZIP, continue
TXT file encoding error          | SGC-2         | Skip body scan, catalog file only
Rate limit hit on collabtunes.com| SGC-1         | Increase sleep, retry up to 3 times
Checkpoint write fails           | Both          | Continue run, note in log, no checkpoint
Max pages reached                | SGC-1         | Stop crawl, flag MAX_PAGES_REACHED in manifest
Max zip depth reached            | SGC-2         | Stop recursion, flag MAX_DEPTH_REACHED
Sensitive file detected          | SGC-2         | Catalog path only, never open

NEVER on failure:
  Write to the live site
  Modify source files
  Delete any file
  Skip writing the error to run_log

================================================================================
END INPUT_OUTPUT_DEPENDENCY_MAP_5_12_26.txt
================================================================================
