================================================================================
MIXED_CLAUDE_HEAVY_CODING_START_PACKAGE_README_5_12_26.txt
CollabORhythm / Collabtunes — Engineering Hardening Phase
Generated: 5.12.26 | Black Claude — Final Engineering Hardening
PURPOSE: The single document Mixed Claude reads first before writing any code.
         Everything needed to begin — in one place — no ambiguity.
STATUS: ACTIVE — primary onboarding document for heavy coding phase
================================================================================

================================================================================
WHO YOU ARE AND WHAT YOU ARE DOING
================================================================================

You are Mixed Claude in heavy coding mode.
Your job: BUILD the two self-gathering Python scripts for the Collabtunes project.
You are NOT redesigning the website. NOT reopening canon debates.
NOT building HTML pages. NOT touching staging files.

You are engineering two Python tools:
  SGC-1: Crawls collabtunes.com → produces a structured site snapshot
  SGC-2: Scans the project repo → produces a structured file inventory

These tools exist so that future Claude sessions do not need to manually
re-read 28+ TXT files at the start of every session. Instead, they run
these tools and receive structured JSON they can query directly.

================================================================================
READ THESE FILES IN THIS EXACT ORDER
================================================================================

Package 1 — Blueprint (6_TXT_COLLABTUNES_GENERATOR_SELF_GATHERING_CODES_BLUEPRINT_5_12_26.zip):
  1. FINAL_SELF_GATHERING_CODE_SPEC_5_12_26.txt          ← master spec, shared principles
  2. CODE_BUILD_ORDER_AND_MODULE_PLAN_5_12_26.txt         ← 28 modules, build sequence
  3. SELF_GATHERING_CODE_1_FLOWMAP_5_12_26.txt            ← SGC-1 phase-by-phase
  4. SELF_GATHERING_CODE_2_FLOWMAP_5_12_26.txt            ← SGC-2 phase-by-phase
  5. INPUT_OUTPUT_DEPENDENCY_MAP_5_12_26.txt              ← I/O contracts
  6. SAFE_RUN_AND_ROLLBACK_PROCEDURES_5_12_26.txt         ← safety procedures

Package 2 — Hardening (FINAL_ENGINEERING_HARDENING_PACKAGE_5_12_26.zip):
  7. THIS FILE                                            ← you are here
  8. FINAL_MODULE_DEPENDENCY_VALIDATION_5_12_26.txt       ← exact import contracts
  9. FILESYSTEM_AND_OUTPUT_STRUCTURE_FINAL_5_12_26.txt    ← exact file/path layout
  10. ERROR_HANDLING_AND_LOGGING_SPEC_FINAL_5_12_26.txt   ← every error, its handler
  11. DRY_RUN_AND_LIVE_RUN_SAFETY_MATRIX_5_12_26.txt      ← per-operation mode matrix
  12. SELF_GATHERING_CODE_TESTING_STRATEGY_5_12_26.txt    ← test plan, test cases

Read all 12. Then code. Not before.

================================================================================
THE SHORT VERSION (if you've read the full docs and need a fast reference)
================================================================================

TWO CODES. BOTH PYTHON. BOTH READ-ONLY. BOTH HAVE DRY_RUN DEFAULT.

SGC-1 (sgc1_main.py):
  Input:  MASTER_URL_AUTHORITY_REGISTRY + live collabtunes.com
  Output: SGC1_LIVE_SITE_SNAPSHOT_{run_id}.json + .txt summary + manifest
  Does:   HTTP HEAD all URLs → classify → crossref nav → selective GET → detect conflicts → export
  Never:  Writes to site. Makes POST/PUT/DELETE. Opens DEFAMATION_RISK_REGISTRY.

SGC-2 (sgc2_main.py):
  Input:  COLLABTUNES_PROJECT_ROOT filesystem
  Output: SGC2_REPO_INVENTORY_{run_id}.json + .txt summary + manifest
  Does:   Walk dir → guard sensitive → parse filenames → inspect ZIPs → scan TXT headers → validate
  Never:  Opens DEFAMATION_RISK_REGISTRY. Modifies any file. Writes outside /outputs/ or /logs/.

BOTH:
  Default = DRY_RUN. Live run requires --mode LIVE_RUN + type "yes" at gate.
  All outputs timestamped. Never overwrites prior output.
  Checkpoint written after each major phase. Resumable from any checkpoint.

================================================================================
SECTION 1 — WHAT YOU BUILD (in order)
================================================================================

STAGE 1 — SHARED FOUNDATION (build these first):
  shared/shared_config.py
  shared/shared_logger.py
  shared/shared_output_writer.py
  shared/shared_checkpoint.py
  shared/shared_naming_validator.py
  → Run STAGE 1 tests before moving on (tests 1.x through 4.x)

STAGE 2 — SGC-1 MODULES:
  sgc1/sgc1_args.py
  sgc1/sgc1_seed_loader.py
  sgc1/sgc1_url_normalizer.py
  sgc1/sgc1_http_requester.py
  sgc1/sgc1_nav_crossref.py
  sgc1/sgc1_body_parser.py
  sgc1/sgc1_routing_classifier.py
  sgc1/sgc1_conflict_detector.py
  sgc1/sgc1_deduplicator.py
  sgc1/sgc1_exporter.py
  sgc1/sgc1_main.py
  → Run STAGE 2 tests (5.x through 8.x) + SGC-1 integration dry run (12.x)

STAGE 3 — SGC-2 MODULES:
  sgc2/sgc2_args.py
  sgc2/sgc2_dir_walker.py
  sgc2/sgc2_sensitive_guard.py
  sgc2/sgc2_filename_parser.py
  sgc2/sgc2_zip_inspector.py
  sgc2/sgc2_txt_scanner.py
  sgc2/sgc2_folder_validator.py
  sgc2/sgc2_dependency_mapper.py
  sgc2/sgc2_deduplicator.py
  sgc2/sgc2_exporter.py
  sgc2/sgc2_main.py
  → Run STAGE 3 tests (9.x through 11.x) + SGC-2 integration dry run (13.x)

STAGE 4 — INTEGRATION:
  run_both.py
  → Run safety invariant tests (14.x) after first LIVE_RUN

================================================================================
SECTION 2 — CRITICAL RULES (DO NOT VIOLATE)
================================================================================

NEVER violate these. They are the entire safety architecture.

  RULE 1: DRY_RUN IS THE DEFAULT.
    No network calls. No file writes to /outputs/. A single dry run report to /logs/.
    LIVE_RUN requires --mode LIVE_RUN AND typing "yes" at the gate.

  RULE 2: ALL WRITES GO THROUGH shared_output_writer.
    Never use open(filepath, 'w') directly.
    Always call validate_output_path() → write_json() or write_txt().
    This is what enforces the write-boundary invariant.

  RULE 3: SENSITIVE FILES ARE NEVER OPENED.
    sgc2_sensitive_guard.partition_file_list() runs before any file reading.
    Files matching SENSITIVE_FILENAME_PATTERNS are cataloged by path only.
    opened_files_log is checked at the end. Any hit = ABORT.

  RULE 4: SGC-1 MAKES ONLY GET AND HEAD REQUESTS.
    The assert in sgc1_http_requester fires before any request.
    If the method is not GET or HEAD: SITE_WRITE_DETECTED abort.
    This guard is not optional. It is not removed for "convenience."

  RULE 5: THE ORCHESTRATORS ARE NEVER IMPORTED.
    sgc1_main.py, sgc2_main.py, and run_both.py are entry points only.
    No other module imports them. No circular dependencies.
    They import everything. Nothing imports them.

  RULE 6: ALL ERROR CODES ARE FROM THE REGISTRY.
    ERROR_HANDLING_AND_LOGGING_SPEC_FINAL_5_12_26.txt has the exact list.
    No improvised error_code strings. Use the exact strings as defined.

  RULE 7: ALL OUTPUT FILENAMES FOLLOW THE NAMING STANDARD.
    shared_output_writer.generate_unique_filename() enforces naming.
    No generic names (output.json, result.txt). Always timestamped run_id.

================================================================================
SECTION 3 — WHAT THE EXISTING PROJECT CONTEXT IS
================================================================================

You are building for a real project called Collaborhythm / Collabtunes.
Website: collabtunes.com
Creator: Tom Jensen
34 albums: 10 Song Lists + 24 Set Lists
121+ live pages
Rating system: G / PG / PG-13 / R / NC-17 / X
Visual canon: midnight navy + parchment + gold — no white text, no green, no gamer colors

The project has extensive existing documentation in 28 TXT files spread across
4 ZIP packages. Your tools (SGC-1 and SGC-2) will eventually let future Claude
sessions REPLACE manually reading all 28 files with running these two scripts.

You are building infrastructure. The content of the site is not your concern.
The structure, safety, and correctness of these two Python tools IS your concern.

Key authority files your code will read as inputs:
  MASTER_URL_AUTHORITY_REGISTRY_5_12_26.txt    — seed for SGC-1 (127+ URLs)
  FINAL_NAVIGATION_AUTHORITY_MAP_5_12_26.txt   — nav structure for SGC-1
  MASTER_CONTENT_RATINGS_INDEX_VOL3            — ratings data for SGC-1
  FINAL_CANON_AUTHORITY_REGISTRY_5_12_26.txt   — canon data for SGC-1
  MASTER_NAMING_STANDARD_5_12_26.txt           — filename rules for SGC-2
  REPOSITORY_STRUCTURE_PLAN_5_12_26.txt        — expected folder structure for SGC-2
  CROSS_SYSTEM_DEPENDENCY_MAP_5_12_26.txt      — dependency chains for SGC-2
  GENERATOR_INPUT_READINESS_REPORT_5_12_26.txt — generator inputs for SGC-2

Sensitive files that must NEVER be opened:
  DEFAMATION_RISK_REGISTRY_VOL1.txt
  CREATOR_INTERVIEW_TRANSCRIPT*.txt
  MASTER_DUMPS*.zip

================================================================================
SECTION 4 — HARDCODED VALUES TO EMBED IN CODE
================================================================================

These values are locked from authority documents. Embed them as constants.

In shared_config.py:
  SENSITIVE_FILENAME_PATTERNS = [
      "DEFAMATION_RISK_REGISTRY",
      "CREATOR_INTERVIEW_TRANSCRIPT",
      "MASTER_DUMPS"
  ]

In sgc1_conflict_detector.py — KNOWN_BLOCKERS dict (partial — includes all from canon registry):
  "CC-LW"    → {blocker: "BLOCK-H04", urls: ["/20-35-the-lady-weaver/", "/36-35-lady-weaver/"]}
  "CC-CH18"  → {blocker: "BLOCK-H02", urls: ["/18-of-35-business-plan/", "/18-of-35-project-summaries/"]}
  "CC-CH19"  → {blocker: "BLOCK-H03", url: "/19-of-35-manifesto-and-copyright-notice/"}
  "CC-YT-URL"→ {blocker: "BLOCK-M01", urls: ["/lyric-videos-...-video/", "/lyric-videos-...-youtube-video/"]}
  "CC-DRIFT" → {blocker: "BLOCK-H01", description: "Chapter labels 18B–34 are +1 ahead of URL slugs"}

In sgc2_folder_validator.py — EXPECTED_FOLDERS list:
  ["00_OPERATIONAL_RULES", "01_LIVE_CAPTURE", "02_CANON", "03_RATINGS",
   "04_BLOCKERS", "05_URL_MAPS", "06_NAV_STABILIZATION", "07_DEPLOYMENT",
   "08_PLACEHOLDERS", "09_HTML_PROTOTYPES", "10_QA", "11_REGISTRY",
   "12_MANIFESTS", "13_SOURCE_ZIPS", "14_GENERATED_OUTPUT"]

In sgc2_dependency_mapper.py — GENERATOR_REQUIRED_INPUTS list:
  ["mood_settings_ratings_explicit_for_all_34_albums",
   "MASTER_CONTENT_RATINGS_INDEX_VOL3",
   "FINAL_CANON_AUTHORITY_REGISTRY",
   "MASTER_URL_AUTHORITY_REGISTRY",
   "HTML_TESTER_NUMBER_TWO_FIXED_COLOR"]

In sgc1_routing_classifier.py — gate assignments:
  "G"    → "NONE"
  "PG"   → "NONE"
  "PG-13"→ "PG13_REQUIRED"
  "R"    → "R_REQUIRED"
  "NC-17"→ "NC17_REQUIRED"
  "X"    → "X_REQUIRED"
  "PENDING" → "UNKNOWN_GATE"

In sgc1_url_normalizer.py — page type patterns (partial):
  /song-list-[N]/           → ALBUM_AIO
  /set-list-[N]/            → ALBUM_AIO
  /set-list-[N]-[title]/    → ALBUM_AIO
  /[N]-of-35-[slug]/        → SONGBOOK
  /how-i-got-here*/         → HGIH
  /1---34-[slug]/            → QUICKGUIDE
  /1-to-34-[slug]/           → QUICKGUIDE
  / (root)                  → NAV
  /128-section-*/           → NAV
  /switchboard-*/           → NAV
  /fast-scroll-*/           → NAV
  /read-my-stuff-*/         → READMYSTUFF
  /future-*/                → PLACEHOLDER
  /coming-soon*/            → PLACEHOLDER
  /html-test*/              → DEV
  /practice-head/           → DEV
  default                   → UNKNOWN

================================================================================
SECTION 5 — KNOWN LIVE BLOCKERS TO FLAG IN SGC-1 OUTPUT
================================================================================

SGC-1 must automatically flag these as CRITICAL if detected on the live site.
These are confirmed from UNRESOLVED_BLOCKERS_REQUIRING_TOM_DECISIONS_5_12_26.txt.

BLOCK-L01: HOW I GOT HERE full dirty and full Talk w/Claude AI — ungated on live site
  Detect: URL matches /how-i-got-here-full-dirty/ or /how-i-got-here-full-talk-wclaudeai/
  AND has_js_gate == False
  Flag: GATE_MISSING + CRITICAL

BLOCK-L02: NC-17 + X Quick Guide pages — ungated on live site
  Detect: URL matches /1-to-34-quick-guide-23-to-nc-17/ or /1-to-34-quick-guide-x/
  AND has_js_gate == False
  Flag: GATE_MISSING + CRITICAL

BLOCK-L03: Full Texts of Lyrics — ungated on live site
  Detect: URL matches /8-of-35-full-texts-of-lyrics/
  AND has_js_gate == False
  Flag: GATE_MISSING + HIGH

BLOCK-H05: Revenue Streams broken URL
  Detect: Any nav source links to /practice-head/ as a content destination
  Flag: BROKEN_NAV_LINK + HIGH

================================================================================
SECTION 6 — QUICK REFERENCE: WHAT EACH FILE IN THIS PACKAGE COVERS
================================================================================

FILE 1 (this file) — MIXED_CLAUDE_HEAVY_CODING_START_PACKAGE_README:
  The master onboarding doc. Read first. Contains everything else in summary form.

FILE 2 — FINAL_MODULE_DEPENDENCY_VALIDATION:
  Every module's import list and public interface (function names + signatures).
  Use this when writing any import statement or function call across modules.
  The circular dependency proof is here. Trust it.

FILE 3 — FILESYSTEM_AND_OUTPUT_STRUCTURE_FINAL:
  Exact directory layout for code + tests + outputs + logs.
  Import path conventions. Output filename formulas. Forbidden write paths.
  Use this when creating any file or directory.

FILE 4 — ERROR_HANDLING_AND_LOGGING_SPEC_FINAL:
  Every error code (exact string), its severity, and its handler behavior.
  Log entry format (exact JSON schema).
  Confirmation gate print format.
  Dry run report format.
  Use this when writing any try/except, log call, or error handler.

FILE 5 — DRY_RUN_AND_LIVE_RUN_SAFETY_MATRIX:
  Per-operation table: what happens in DRY_RUN vs LIVE_RUN for every phase.
  Mode guard implementation (exactly where the if/else goes in orchestrators).
  Confirmation gate implementation code.
  Progress indicator rules.
  Use this when implementing any phase that has different behavior by mode.

FILE 6 — SELF_GATHERING_CODE_TESTING_STRATEGY:
  Every test case (46 total), grouped by module and stage.
  Test fixture requirements.
  Build-gate test order (must all pass before next stage).
  Use this when writing tests and deciding when to proceed.

================================================================================
SECTION 7 — WHAT GOOD LOOKS LIKE (DELIVERY CRITERIA)
================================================================================

Mixed Claude's work is complete when:

  ✅ All 28 modules exist in the correct directory structure
  ✅ Both sgc1_main.py and sgc2_main.py run to completion in DRY_RUN mode
  ✅ DRY_RUN produces the correct report in /logs/ and nothing in /outputs/
  ✅ All 46 test cases pass (or documented with specific known limitation)
  ✅ LIVE_RUN (on test fixtures) produces valid JSON and TXT in /outputs/
  ✅ Output JSON validates against the schema in FINAL_SELF_GATHERING_CODE_SPEC
  ✅ All three integrity log entries present in every LIVE_RUN output
  ✅ Sensitive file test confirms DEFAMATION_RISK_REGISTRY is never opened
  ✅ No circular dependency exists (verified by tracing import graph)
  ✅ run_both.py executes SGC-2 then SGC-1 cleanly in sequence

DELIVERY FORMAT:
  All code files in a ZIP named:
  28_PY_COLLABTUNES_GENERATOR_SELF_GATHERING_CODES_V1_[DATE].zip
  With contents:
    shared/ (5 files)
    sgc1/ (11 files)
    sgc2/ (11 files)
    run_both.py
    tests/ (test scripts + fixtures)
    README_HOW_TO_RUN_SGC_CODES.txt (brief usage instructions)

================================================================================
SECTION 8 — WHEN IN DOUBT
================================================================================

If something is unclear, the authority hierarchy is:
  1. FINAL_SELF_GATHERING_CODE_SPEC_5_12_26.txt (master spec — top authority)
  2. The relevant flowmap (SGC-1 or SGC-2)
  3. FINAL_MODULE_DEPENDENCY_VALIDATION (interface contracts)
  4. ERROR_HANDLING_AND_LOGGING_SPEC_FINAL (error behavior)
  5. DRY_RUN_AND_LIVE_RUN_SAFETY_MATRIX (mode behavior)

If a document contradicts the spec: the spec wins.
If the spec is silent: default to the safest possible interpretation.
"Safest" means: do less, read-only, log the uncertainty, don't guess.

NEVER:
  Invent error codes not in the registry
  Modify any project file from the code
  Open a sensitive file for any reason
  Make a non-GET/HEAD request to collabtunes.com
  Write output outside /outputs/ or /logs/
  Add a module-level mutable variable (except opened_files_log — already defined)
  Proceed past a failing test

================================================================================
YOU ARE READY. BUILD STAGE 1 FIRST.
================================================================================

  Create: 15_GATHERING_TOOLS/shared/shared_config.py
  Test:   python tests/test_shared_config.py → ALL PASS
  Then:   shared_logger.py → tests → shared_output_writer.py → tests → etc.

The blueprint is complete. The engineering layer is hardened.
The interfaces are defined. The safety matrix is locked.
The tests are written. The delivery criteria are clear.

Build it.

================================================================================
END MIXED_CLAUDE_HEAVY_CODING_START_PACKAGE_README_5_12_26.txt
================================================================================
