================================================================================
FILESYSTEM_AND_OUTPUT_STRUCTURE_FINAL_5_12_26.txt
CollabORhythm / Collabtunes — Engineering Hardening Phase
Generated: 5.12.26 | Black Claude — Final Engineering Hardening
PURPOSE: Exact, literal filesystem layout for both SGC codes and all outputs.
         No ambiguity. Mixed Claude builds exactly this structure.
STATUS: ENGINEERING PREP — authoritative layout
================================================================================

================================================================================
SECTION 1 — CODE DIRECTORY LAYOUT
================================================================================

All SGC code lives under:
  COLLABTUNES_PROJECT_ROOT/15_GATHERING_TOOLS/

Full tree:

  15_GATHERING_TOOLS/
  │
  ├── shared/
  │   ├── __init__.py               (empty — makes it a package)
  │   ├── shared_config.py
  │   ├── shared_logger.py
  │   ├── shared_output_writer.py
  │   ├── shared_checkpoint.py
  │   └── shared_naming_validator.py
  │
  ├── sgc1/
  │   ├── __init__.py               (empty)
  │   ├── sgc1_args.py
  │   ├── sgc1_seed_loader.py
  │   ├── sgc1_url_normalizer.py
  │   ├── sgc1_http_requester.py
  │   ├── sgc1_nav_crossref.py
  │   ├── sgc1_body_parser.py
  │   ├── sgc1_routing_classifier.py
  │   ├── sgc1_conflict_detector.py
  │   ├── sgc1_deduplicator.py
  │   ├── sgc1_exporter.py
  │   └── sgc1_main.py
  │
  ├── sgc2/
  │   ├── __init__.py               (empty)
  │   ├── sgc2_args.py
  │   ├── sgc2_dir_walker.py
  │   ├── sgc2_sensitive_guard.py
  │   ├── sgc2_filename_parser.py
  │   ├── sgc2_zip_inspector.py
  │   ├── sgc2_txt_scanner.py
  │   ├── sgc2_folder_validator.py
  │   ├── sgc2_dependency_mapper.py
  │   ├── sgc2_deduplicator.py
  │   ├── sgc2_exporter.py
  │   └── sgc2_main.py
  │
  ├── outputs/                      (created by code if absent — writable)
  │   └── [all SGC output files land here]
  │
  ├── logs/                         (created by code if absent — writable)
  │   └── [all checkpoint and log files land here]
  │
  └── run_both.py

================================================================================
SECTION 2 — IMPORT PATH RULES
================================================================================

Because both sgc1/ and sgc2/ are sibling packages under 15_GATHERING_TOOLS/,
imports in each package resolve like this:

FROM sgc1/ modules:
  from shared.shared_config import ALLOWED_OUTPUT_ROOT, ...
  from shared.shared_logger import RunLogger
  from shared.shared_output_writer import write_json, write_txt, ...
  from shared.shared_checkpoint import write_checkpoint, read_checkpoint
  from shared.shared_naming_validator import parse_filename, ...

FROM sgc2/ modules:
  from shared.shared_config import ...          (same as above)
  from shared.shared_naming_validator import ... (same as above)

FROM sgc1_main.py importing sgc1 siblings:
  from sgc1.sgc1_args import parse_args
  from sgc1.sgc1_seed_loader import load_seed_urls, filter_for_crawl, ...
  (etc. — module name = file name without .py)

FROM sgc2_main.py importing sgc2 siblings:
  from sgc2.sgc2_args import parse_args
  from sgc2.sgc2_dir_walker import walk_directory, ...
  (etc.)

FROM run_both.py:
  from sgc1.sgc1_main import main as sgc1_main
  from sgc2.sgc2_main import main as sgc2_main

INVOCATION (from 15_GATHERING_TOOLS/ directory):
  python -m sgc1.sgc1_main [args]
  python -m sgc2.sgc2_main [args]
  python run_both.py [args]

================================================================================
SECTION 3 — OUTPUT FILE NAMING SCHEMA
================================================================================

All output filenames follow the Collabtunes naming standard.
Every output filename is constructed at runtime using this formula:

  run_id = f"SGC{code_number}_{date}_{time}"
  Example: SGC1_5_12_26_1430 (code 1, May 12 2026, 2:30 PM)
  Example: SGC2_5_12_26_0915

OUTPUT FILES — SGC-1 (written to /outputs/):

  Primary JSON:
    SGC1_LIVE_SITE_SNAPSHOT_{run_id}.json
    Example: SGC1_LIVE_SITE_SNAPSHOT_5_12_26_1430.json

  Human-readable TXT:
    SGC1_LIVE_SITE_SNAPSHOT_{run_id}_SUMMARY.txt
    Example: SGC1_LIVE_SITE_SNAPSHOT_5_12_26_1430_SUMMARY.txt

  Run manifest:
    SGC1_RUN_MANIFEST_{run_id}.txt
    Example: SGC1_RUN_MANIFEST_5_12_26_1430.txt

CHECKPOINT FILES — SGC-1 (written to /logs/):

  Start checkpoint:
    SGC1_CHECKPOINT_{run_id}_START.json
  Phase 2 complete:
    SGC1_CHECKPOINT_{run_id}_PHASE2_COMPLETE.json
  Phase 4 complete:
    SGC1_CHECKPOINT_{run_id}_PHASE4_COMPLETE.json
  Final complete:
    SGC1_CHECKPOINT_{run_id}_COMPLETE.json
  Dry run log:
    SGC1_DRY_RUN_{run_id}.txt

OUTPUT FILES — SGC-2 (written to /outputs/):

  Primary JSON:
    SGC2_REPO_INVENTORY_{run_id}.json
  Human-readable TXT:
    SGC2_REPO_INVENTORY_{run_id}_SUMMARY.txt
  Run manifest:
    SGC2_RUN_MANIFEST_{run_id}.txt

CHECKPOINT FILES — SGC-2 (written to /logs/):

  SGC2_CHECKPOINT_{run_id}_START.json
  SGC2_CHECKPOINT_{run_id}_PHASE1_COMPLETE.json
  SGC2_CHECKPOINT_{run_id}_PHASE3_COMPLETE.json
  SGC2_CHECKPOINT_{run_id}_PHASE6_COMPLETE.json
  SGC2_CHECKPOINT_{run_id}_COMPLETE.json
  SGC2_DRY_RUN_{run_id}.txt

COLLISION PREVENTION:
  If a file with the same name already exists in /outputs/:
    shared_output_writer.generate_unique_filename() appends _01, _02, etc.
    Example: SGC1_LIVE_SITE_SNAPSHOT_5_12_26_1430_01.json
  This never overwrites. It always creates new.

================================================================================
SECTION 4 — OUTPUT FILE SIZES (EXPECTED RANGES)
================================================================================

These are engineering estimates based on known site scale (121+ pages, 28+ repo files).
Mixed Claude uses these to detect pathological failures (e.g. 0-byte output).

SGC-1 outputs (121+ URLs, 7-15 conflicts, 10-25 flags):
  Primary JSON:         250 KB – 1.5 MB   (depends on body content captured)
  Summary TXT:          50 KB – 200 KB
  Run Manifest:         2 KB – 10 KB
  Each checkpoint:      100 KB – 800 KB

SGC-2 outputs (28+ files, 8 ZIPs, 15+ folders):
  Primary JSON:         100 KB – 500 KB
  Summary TXT:          30 KB – 150 KB
  Run Manifest:         2 KB – 8 KB
  Each checkpoint:      50 KB – 300 KB

FAILURE THRESHOLD: Any output file < 1 KB after a successful run = WRITE_ERROR.
  Trigger: shared_output_writer verifies size after every write.
  Action: log WRITE_ERROR, continue run (do not abort for this alone).

================================================================================
SECTION 5 — FORBIDDEN WRITE PATHS
================================================================================

These paths must NEVER be written to by either code.
Enforced via validate_output_path() assert before every write.

FORBIDDEN:
  Any path under COLLABTUNES_PROJECT_ROOT/ except 15_GATHERING_TOOLS/outputs/
                                                and 15_GATHERING_TOOLS/logs/
  https://collabtunes.com (network — never a write target)
  Any path containing "DEFAMATION_RISK_REGISTRY"
  Any path containing "CREATOR_INTERVIEW_TRANSCRIPT"
  Any path that resolves outside COLLABTUNES_PROJECT_ROOT/

ALLOWED:
  15_GATHERING_TOOLS/outputs/  — all primary output files
  15_GATHERING_TOOLS/logs/     — all checkpoint and log files

ENFORCEMENT MECHANISM:
  In shared_output_writer.validate_output_path():
    assert os.path.abspath(filepath).startswith(
        os.path.abspath(ALLOWED_OUTPUT_ROOT)
    ) or os.path.abspath(filepath).startswith(
        os.path.abspath(ALLOWED_LOG_ROOT)
    ), f"WRITE OUTSIDE ALLOWED PATH: {filepath}"

  Any assertion failure → logs ERROR + raises SystemExit immediately.
  This is the last line of defense against accidental writes.

================================================================================
SECTION 6 — DIRECTORY CREATION RULES
================================================================================

DIRECTORIES THE CODE MAY CREATE:
  /outputs/ — if it does not exist at startup
  /logs/    — if it does not exist at startup

HOW TO CREATE:
  os.makedirs(path, exist_ok=True)
  Log the creation: logger.log("SETUP", f"Created directory: {path}")

DIRECTORIES THE CODE MUST NEVER CREATE:
  Any new folder in COLLABTUNES_PROJECT_ROOT/ (outside 15_GATHERING_TOOLS/)
  Any subfolder inside /outputs/ or /logs/ (flat structure only)
  Any temporary directory outside the project root

RATIONALE:
  /outputs/ and /logs/ are the only writable zones.
  Flat structure in each — no subdirectories — prevents path confusion.
  All files in /outputs/ are directly accessible without navigation.

================================================================================
SECTION 7 — FILE RETENTION RULES
================================================================================

NEVER DELETE:
  Any file in /logs/ — checkpoints are permanent records
  Any prior output in /outputs/ — versioned by timestamp, never overwritten

AGING POLICY (for Mixed Claude's awareness — not implemented in code):
  After 30 days, checkpoints in /logs/ may be manually archived
  After 30 days, prior /outputs/ versions may be manually archived
  These are manual operations by Tom — the code never purges files

OVERWRITE PREVENTION (code-enforced):
  generate_unique_filename() guarantees no two outputs share a name
  validate_output_path() refuses to write to an existing path
  These two together make overwriting structurally impossible

================================================================================
END FILESYSTEM_AND_OUTPUT_STRUCTURE_FINAL_5_12_26.txt
================================================================================
