================================================================================
SAFE_RUN_AND_ROLLBACK_PROCEDURES_5_12_26.txt
CollabORhythm / Collabtunes — Engineering Blueprint Phase
Generated: 5.12.26 | Black Claude — Blueprint Session
PURPOSE: How to safely run, verify, and roll back both SGC codes
STATUS: ENGINEERING PREP — blueprint phase
================================================================================

RULE: Defaults protect everything.
DRY_RUN is always the default. You must explicitly opt into LIVE_RUN.
Neither code can touch the live site or modify any source file under any circumstances.

================================================================================
SECTION 1 — RUN MODES EXPLAINED
================================================================================

DRY_RUN (DEFAULT):
  What happens: All phases execute. All logic runs. Nothing is written to /outputs/.
                Everything is written to /logs/ only.
  Who should use: First time running either code. Any time unsure.
  Output: /logs/SGC[1|2]_DRY_RUN_[DATE]_[TIME].txt (what would have happened)
  How to invoke: python sgc1.py (no flag needed — DRY_RUN is default)

LIVE_RUN:
  What happens: All phases execute. Outputs written to /outputs/.
                Checkpoints written to /logs/.
  Who should use: After DRY_RUN has been reviewed and output looks correct.
  How to invoke: python sgc1.py --mode LIVE_RUN
  Gate: Code prints summary of what it will do + asks for confirmation before Phase 2+

CONFIRMATION GATE (LIVE_RUN only):
  Before Phase 2 begins, code prints:
    "SGC-1 LIVE RUN SUMMARY:
     Seed URLs loaded: [N]
     Network target: https://collabtunes.com
     Output dir: [path]
     Mode: LIVE_RUN
     Proceed? [yes/no]"
  If input ≠ "yes": abort immediately. No network calls made.

================================================================================
SECTION 2 — PRE-RUN CHECKLIST (Before running either code)
================================================================================

FOR SGC-1:
  [ ] Network access to collabtunes.com confirmed (ping or curl test)
  [ ] MASTER_URL_AUTHORITY_REGISTRY_5_12_26.txt exists at expected path
  [ ] FINAL_NAVIGATION_AUTHORITY_MAP_5_12_26.txt exists at expected path
  [ ] /outputs/ directory writable
  [ ] /logs/ directory writable
  [ ] No prior run in progress (check /logs/ for PARTIAL checkpoint)
  [ ] Rate limit set to minimum 1.5s between requests (do not lower this)
  [ ] Time available: estimate 1 min per 10 URLs for HEAD, 3 min per 10 for GET

FOR SGC-2:
  [ ] COLLABTUNES_PROJECT_ROOT path confirmed and readable
  [ ] MASTER_NAMING_STANDARD_5_12_26.txt exists
  [ ] REPOSITORY_STRUCTURE_PLAN_5_12_26.txt exists
  [ ] CROSS_SYSTEM_DEPENDENCY_MAP_5_12_26.txt exists
  [ ] GENERATOR_INPUT_READINESS_REPORT_5_12_26.txt exists
  [ ] /outputs/ and /logs/ writable
  [ ] Sensitive file list loaded (DEFAMATION_RISK_REGISTRY, etc.)
  [ ] No prior run in progress

================================================================================
SECTION 3 — STEP-BY-STEP SAFE RUN PROCEDURE
================================================================================

STEP A — RUN DRY_RUN FIRST (always):
  python sgc1.py --root [path] --output-dir ./outputs/
  Review /logs/SGC1_DRY_RUN_*.txt
  Confirm: phase list, URL count, expected conflict count look correct
  Confirm: sensitive files detected and flagged (not opened)
  If output looks wrong: fix code, do not run LIVE_RUN yet

STEP B — REVIEW DRY_RUN OUTPUT:
  Open DRY_RUN log
  Check: "WHAT WOULD HAVE BEEN WRITTEN TO /outputs/"
  Verify: file names follow naming standard
  Verify: no unexpected files listed
  Verify: run_log shows no ABORT triggers

STEP C — RUN LIVE_RUN:
  python sgc1.py --root [path] --output-dir ./outputs/ --mode LIVE_RUN
  Type "yes" at confirmation gate
  Let run complete — do not interrupt
  If interrupted: note checkpoint file path for resume

STEP D — VERIFY OUTPUT:
  Open /outputs/SGC1_LIVE_SITE_SNAPSHOT_[DATE]_[TIME].json
  Check: total_items_found > 100 (if < 100, something went wrong)
  Check: total_conflicts_found reasonable (expect 7-15 based on known blockers)
  Open TXT SUMMARY — review CRITICAL flags section
  Compare CRITICAL flags against known live blockers from UNRESOLVED_BLOCKERS file
    Expected CRITICAL flags:
      GATE_MISSING on HOW I GOT HERE X pages (BLOCK-L01)
      GATE_MISSING on Quick Guide NC-17/X (BLOCK-L02)
      GATE_MISSING on Full Texts of Lyrics (BLOCK-L03)
      STATUS_MISMATCH if any expected LIVE page returns 404

STEP E — ARCHIVE RUN OUTPUT:
  Do not delete prior run outputs — they are the baseline for next comparison
  Rename nothing — timestamp in filename is the version control

================================================================================
SECTION 4 — RESUME FROM PARTIAL RUN
================================================================================

IF SGC-1 or SGC-2 is interrupted mid-run:

STEP R1 — LOCATE CHECKPOINT FILE:
  Check /logs/ for most recent SGC[1|2]_CHECKPOINT_[DATE]_[TIME]_*_PARTIAL.json
  Open file — check: {step: "PHASE[N]_COMPLETE"} to know where it stopped

STEP R2 — RESUME COMMAND:
  python sgc1.py --resume /logs/SGC1_CHECKPOINT_[DATE]_[TIME]_PHASE2_COMPLETE.json
  Code skips Phase 0 through completed phase
  Resumes from next phase using data in checkpoint

STEP R3 — VERIFY RESUME STATE:
  Check run_log in final output
  Confirm: no phase was run twice
  Confirm: resume_from value matches checkpoint step

STEP R4 — IF RESUME FAILS:
  Delete partial outputs
  Start fresh run from Phase 0
  Note what caused the failure before restarting

================================================================================
SECTION 5 — ROLLBACK PROCEDURES FOR SGC OUTPUTS
================================================================================

SITUATION 1: SGC-1 output shows incorrect data (wrong URLs, wrong statuses)
  CAUSE LIKELY: Stale seed file OR network issue during crawl
  ROLLBACK:
    Step 1: Delete the incorrect SGC-1 output from /outputs/
    Step 2: Do NOT delete the checkpoint files (needed for diagnosis)
    Step 3: Verify seed file is the AUTHORITATIVE version from MASTER_URL_AUTHORITY_REGISTRY
    Step 4: Run SGC-2 first to confirm which seed file is current
    Step 5: Re-run SGC-1 with correct seed file path

SITUATION 2: SGC-2 output misidentifies AUTHORITATIVE files
  CAUSE LIKELY: Naming convention not followed on some files
  ROLLBACK:
    Step 1: Open SGC2_REPO_INVENTORY.json
    Step 2: Find item with incorrect authority tag
    Step 3: Check the actual file's header scan vs naming convention
    Step 4: If code logic is wrong: fix regex in Phase 2.3, re-run
    Step 5: If file is actually misnamed: rename it per MASTER_NAMING_STANDARD
             (rename action is manual, done by human — not by the code)
    Step 6: Re-run SGC-2

SITUATION 3: Code accidentally wrote to a source file (should be impossible)
  CAUSE: Bug in file I/O logic
  ROLLBACK:
    Step 1: STOP — do not run any more code until this is diagnosed
    Step 2: Check mtime on all source files in COLLABTUNES_PROJECT_ROOT
    Step 3: Identify which file was modified
    Step 4: Restore from most recent backup (ROLLBACK_[SYSTEM]_PRE_[OP]_[DATE].zip)
            These backups are defined in ROLLBACK_AND_RECOVERY_PROCEDURES_5_12_26.txt
    Step 5: Fix the code bug before any further runs

SITUATION 4: SGC-1 made write requests to collabtunes.com (must never happen)
  CAUSE: Critical bug — should be architecturally impossible
  ROLLBACK:
    Step 1: Immediately inspect Yola activity log for any unexpected changes
    Step 2: Revert any changed pages using ROLLBACK procedures from production files
    Step 3: Pull the code from use immediately
    Step 4: Full audit of HTTP request code before any re-deployment

================================================================================
SECTION 6 — OVERWRITE PREVENTION RULES
================================================================================

RULE 1 — NEVER overwrite an existing output file:
  Before writing any output, check if filename already exists in /outputs/
  IF exists: append _CONFLICT_[SEQUENCE] to filename before writing
  Never silently overwrite
  Exception: checkpoint files in /logs/ may be appended-to, not overwritten

RULE 2 — Output filename uniqueness enforcement:
  Every output filename includes [DATE]_[TIME] to nanosecond precision
  IF two runs happen within the same second: append _[RUN_SEQUENCE] counter

RULE 3 — Source file modification guard (both codes):
  After Phase 0 loads files: record mtime for each loaded file
  After each phase: re-check mtime of loaded files
  IF any mtime changed: write INTEGRITY_ERROR to run_log and abort
  This is a hard abort — no recovery, no retry

RULE 4 — Output directory isolation:
  Both codes write ONLY to /outputs/ and /logs/
  Any attempt to write outside these directories → ABORT immediately
  This is enforced via path validation before every file write operation:
    assert output_path.startswith(ALLOWED_OUTPUT_ROOT), "WRITE OUTSIDE ALLOWED PATH"

================================================================================
SECTION 7 — VERIFICATION CHECKLIST (Post-run)
================================================================================

SGC-1 POST-RUN VERIFICATION:
  [ ] total_items_found ≥ 121 (known live page count)
  [ ] LIVE status count ≥ 100
  [ ] CRITICAL flags present for known live blockers (BLOCK-L01, L02, L03)
  [ ] Chapter drift items found: between 15 and 19 (known: 17 drifted)
  [ ] No item has both rating = LOCKED and flag = GATE_MISSING
      (locked + ungated = contradiction — investigate)
  [ ] Conflict count ≥ 7 (known: 7 URL conflicts in URL registry)
  [ ] No collabtunes.com page appears in /outputs/ as a downloaded file
  [ ] run_log shows "SITE_INTEGRITY_VERIFIED: read-only crawl confirmed"

SGC-2 POST-RUN VERIFICATION:
  [ ] file count ≥ 28 (known: 28 TXT files in current packages)
  [ ] AUTHORITATIVE files identified for each major topic (canon, ratings, URL maps)
  [ ] DEPRECATED files flagged (original HTML files without FIXED_COLOR)
  [ ] mood_settings_ratings_explicit_for_all_34_albums: found or CRITICAL_MISSING flag
  [ ] DEFAMATION_RISK_REGISTRY: in sensitive_files[], NOT in files[]
  [ ] All ZIPs: either in_manifest = true OR flag MISSING_MANIFEST
  [ ] Version chains show VOL3 as AUTHORITATIVE for ratings, canon, summaries
  [ ] run_log shows no INTEGRITY_ERROR entries
  [ ] No source file mtimes changed (run_log: "REPO_INTEGRITY_VERIFIED")

================================================================================
SECTION 8 — ERROR CODE REFERENCE
================================================================================

ERROR_CODE            | MEANING                                    | ACTION
----------------------|--------------------------------------------|------------------
MISSING_DEPENDENCY    | Required input file not found              | Locate file, update path arg
NETWORK_UNREACHABLE   | Cannot connect to collabtunes.com          | Check internet; retry later
RATE_LIMIT_HIT        | Too many requests to Yola                  | Increase sleep, retry
CORRUPT_ZIP           | ZIP file could not be opened               | Flag for Tom, skip and continue
ENCODING_ERROR        | TXT file not UTF-8                         | Skip body scan, catalog only
MAX_PAGES_REACHED     | Hit --max-pages cap                        | Increase cap or accept partial
MAX_DEPTH_REACHED     | Nested ZIP too deep                        | Increase max-zip-depth or accept
WRITE_OUTSIDE_PATH    | Code tried to write outside /outputs/      | ABORT — critical code bug
INTEGRITY_ERROR       | Source file mtime changed during run       | ABORT — investigate immediately
WRITE_ERROR           | Output file 0 bytes                        | Re-run; check disk space
SENSITIVE_OPENED      | Code tried to open DEFAMATION file         | ABORT — critical code bug
SITE_WRITE_DETECTED   | Code made non-GET request to live site     | ABORT — critical code bug

================================================================================
END SAFE_RUN_AND_ROLLBACK_PROCEDURES_5_12_26.txt
================================================================================
