================================================================================
SELF_GATHERING_CODE_TESTING_STRATEGY_5_12_26.txt
CollabORhythm / Collabtunes — Engineering Hardening Phase
Generated: 5.12.26 | Black Claude — Final Engineering Hardening
PURPOSE: Every test Mixed Claude must run during and after heavy coding.
         Ordered by build stage. No test framework required — Python only.
STATUS: ENGINEERING PREP — deterministic test plan
================================================================================

TESTING PHILOSOPHY:
  No test framework (pytest, unittest) required for v1.
  All tests are runnable with: python test_[module].py
  Tests are discrete scripts — one per module — in 15_GATHERING_TOOLS/tests/
  A test either prints PASS or FAIL with a specific message.
  Mixed Claude writes tests as it writes each module.
  All tests must pass before moving to the next build stage.

================================================================================
SECTION 1 — STAGE 1 TESTS (Shared Modules)
================================================================================

── TEST: shared_config ───────────────────────────────────────────────────────
File: tests/test_shared_config.py

  TC-1.1: All required constants exist and are the correct type
    assert isinstance(ALLOWED_OUTPUT_ROOT, str)
    assert isinstance(ALLOWED_LOG_ROOT, str)
    assert isinstance(SENSITIVE_FILENAME_PATTERNS, list)
    assert isinstance(RATE_LIMIT_SECONDS, float)
    assert RATE_LIMIT_SECONDS >= 1.0  # must be polite
    assert len(APPROVED_CATEGORIES) >= 15
    assert "CANON" in APPROVED_CATEGORIES
    assert "LIVE_CAPTURE" in APPROVED_CATEGORIES
    PASS condition: all asserts pass

  TC-1.2: No config value is None
    for attr in dir(config):
        if not attr.startswith('_'):
            assert getattr(config, attr) is not None
    PASS condition: all assert

  TC-1.3: Sensitive patterns are non-empty strings
    for p in SENSITIVE_FILENAME_PATTERNS:
        assert isinstance(p, str) and len(p) > 0
    PASS condition: all assert

── TEST: shared_naming_validator ─────────────────────────────────────────────
File: tests/test_shared_naming_validator.py

  TC-2.1: parse_filename — good filename
    result = parse_filename("FINAL_CANON_AUTHORITY_REGISTRY_5_12_26.txt")
    assert result["date_token"] == "5_12_26"
    assert "CANON" in result["category"] or "AUTHORITY" in result["category"]
    PASS condition: date_token and category extracted correctly

  TC-2.2: parse_filename — ZIP with count
    result = parse_filename("11_TXT_COLLABTUNES_RATINGS_CANON_5_12_26.zip")
    assert result["file_count"] == 11
    assert result["date_token"] == "5_12_26"
    PASS condition: file_count extracted as int

  TC-2.3: validate_filename — compliant
    result = validate_filename("MASTER_URL_AUTHORITY_REGISTRY_VOL1_5_12_26.txt")
    assert result["compliant"] == True
    assert len(result["violations"]) == 0

  TC-2.4: validate_filename — banned name
    result = validate_filename("final.zip")
    assert result["compliant"] == False
    assert len(result["violations"]) > 0

  TC-2.5: infer_authority_from_filename — ACTIVE_CANON
    result = infer_authority_from_filename("COLLABTUNES_OUTPUT_NAMING_RULES_ACTIVE_CANON_5_12_26.txt")
    assert result == "LOCKED"

  TC-2.6: infer_authority_from_filename — DEPRECATED
    result = infer_authority_from_filename("OLD_HTML_ARCHIVE_5_11_26.zip")
    assert result in ["DEPRECATED", "UNKNOWN"]  # either acceptable — ARCHIVE maps to DEPRECATED

  TC-2.7: parse_filename — no date token
    result = parse_filename("random_file_no_date.txt")
    assert result["date_token"] is None

── TEST: shared_output_writer ────────────────────────────────────────────────
File: tests/test_shared_output_writer.py

  TC-3.1: validate_output_path — allowed path passes
    validate_output_path("./outputs/test_file.json")
    PASS condition: no AssertionError raised

  TC-3.2: validate_output_path — forbidden path raises
    try:
        validate_output_path("/etc/passwd")
        FAIL — should have raised
    except AssertionError:
        PASS

  TC-3.3: validate_output_path — existing file raises
    Create a temp file in /outputs/. Then:
    try:
        validate_output_path(temp_file_path)
        FAIL — should have raised (file exists)
    except AssertionError:
        PASS (file already exists — overwrite prevented)
    Cleanup: delete temp file.

  TC-3.4: write_json — creates valid JSON file
    data = {"test_key": "test_value", "items": [1, 2, 3]}
    write_json("./outputs/test_write_json.json", data, mock_logger)
    assert os.path.exists("./outputs/test_write_json.json")
    with open("./outputs/test_write_json.json") as f:
        parsed = json.load(f)
    assert parsed["test_key"] == "test_value"
    Cleanup: delete test file.

  TC-3.5: generate_unique_filename — no collision
    fn1 = generate_unique_filename("SGC1_TEST", ".json", "./outputs/")
    fn2 = generate_unique_filename("SGC1_TEST", ".json", "./outputs/")
    assert fn1 != fn2
    PASS condition: two calls produce different names

── TEST: shared_logger ───────────────────────────────────────────────────────
File: tests/test_shared_logger.py

  TC-4.1: log() produces correct format
    logger = RunLogger("TEST_001", "DRY_RUN", "./logs/")
    logger.log("PHASE_0", "Test message", severity="INFO")
    log = logger.get_log()
    assert len(log) == 1
    assert log[0]["step"] == "PHASE_0"
    assert log[0]["severity"] == "INFO"
    assert log[0]["error_code"] is None

  TC-4.2: log_error() sets error_code
    logger.log_error("PHASE_2", "REQUEST_TIMEOUT", "Timeout on /song-list-1/")
    log = logger.get_log()
    last = log[-1]
    assert last["error_code"] == "REQUEST_TIMEOUT"
    assert last["severity"] == "ERROR"

  TC-4.3: log_abort() raises SystemExit
    try:
        logger.log_abort("Test abort")
        FAIL — should have raised
    except SystemExit:
        PASS

================================================================================
SECTION 2 — STAGE 2 TESTS (SGC-1 Modules)
================================================================================

── TEST: sgc1_url_normalizer ─────────────────────────────────────────────────
File: tests/test_sgc1_url_normalizer.py

  TC-5.1: normalize — adds https prefix
    assert normalize("collabtunes.com/song-list-1/") == "https://collabtunes.com/song-list-1/"

  TC-5.2: normalize — adds trailing slash
    assert normalize("https://collabtunes.com/song-list-1") == "https://collabtunes.com/song-list-1/"

  TC-5.3: normalize — lowercase slug
    assert normalize("https://collabtunes.com/SONG-LIST-1/") == "https://collabtunes.com/song-list-1/"

  TC-5.4: classify_page_type — Song List
    assert classify_page_type("https://collabtunes.com/song-list-5/") == "ALBUM_AIO"

  TC-5.5: classify_page_type — Set List with title slug
    assert classify_page_type("https://collabtunes.com/set-list-22-inherent-absence/") == "ALBUM_AIO"

  TC-5.6: classify_page_type — Songbook chapter
    assert classify_page_type("https://collabtunes.com/7-of-35-quick-guide-g-to-x/") == "SONGBOOK"

  TC-5.7: classify_page_type — dev page
    assert classify_page_type("https://collabtunes.com/html-test1/") == "DEV"

  TC-5.8: classify_page_type — placeholder
    assert classify_page_type("https://collabtunes.com/future-ai-search-tools/") == "PLACEHOLDER"

  TC-5.9: to_slug
    assert to_slug("https://collabtunes.com/song-list-1/") == "/song-list-1/"

── TEST: sgc1_http_requester (DRY_RUN simulation — no live network) ──────────
File: tests/test_sgc1_http_requester.py

  TC-6.1: Method guard — POST raises
    Using mock, attempt to trigger a POST request.
    The guard assert should fire before requests.post is called.
    assert AssertionError raised with message containing "SITE_WRITE_DETECTED"

  TC-6.2: Timeout handling — mock timeout
    Using unittest.mock.patch to mock requests.head to raise Timeout.
    Call head_request(url, timeout=1, logger=mock_logger)
    After two timeouts: result["error"] == "TIMEOUT_ERROR"
    PASS condition: no uncaught exception, correct error in result

  TC-6.3: Rate limit enforcement
    Record timestamps before and after two sequential head_request calls.
    elapsed = t2 - t1
    assert elapsed >= RATE_LIMIT_SECONDS * 0.9  # allow 10% tolerance
    NOTE: Use mock responses — don't hit real network in tests.

── TEST: sgc1_routing_classifier ─────────────────────────────────────────────
File: tests/test_sgc1_routing_classifier.py

  TC-7.1: assign_rating — locked canon entry
    canon_data has SL1 as PG-13. ratings_data has SL1 as PG-13.
    result = assign_rating("https://collabtunes.com/song-list-1/", "Song List 1", canon_data, ratings_data)
    assert result == "PG-13"

  TC-7.2: assign_gate — PG-13 rating
    assert assign_gate("PG-13") == "PG13_REQUIRED"

  TC-7.3: assign_gate — G rating
    assert assign_gate("G") == "NONE"

  TC-7.4: assign_gate — X rating
    assert assign_gate("X") == "X_REQUIRED"

  TC-7.5: detect_gate_missing — R page with no gate
    result_item = {"rating": "R", "has_js_gate": False, "url": "..."}
    assert detect_gate_missing(result_item) == True

  TC-7.6: detect_gate_missing — G page (no gate needed)
    result_item = {"rating": "G", "has_js_gate": False, "url": "..."}
    assert detect_gate_missing(result_item) == False

── TEST: sgc1_conflict_detector ──────────────────────────────────────────────
File: tests/test_sgc1_conflict_detector.py

  TC-8.1: detect_chapter_drift — known drifted chapter
    url = "https://collabtunes.com/18-of-35-business-plan-and-21-page-summary/"
    nav_label = 19  # label says 19, URL says 18
    conflict = detect_chapter_drift(url, nav_label)
    assert conflict is not None
    assert conflict["conflict_type"] == "CHAPTER_DRIFT"

  TC-8.2: detect_chapter_drift — non-drifted chapter
    url = "https://collabtunes.com/7-of-35-quick-guide-g-to-x/"
    nav_label = 7
    conflict = detect_chapter_drift(url, nav_label)
    assert conflict is None

  TC-8.3: link_to_known_blocker — Lady Weaver conflict
    conflict = {"conflict_type": "URL_SLUG_MISMATCH",
                "item_a": "/20-35-the-lady-weaver/",
                "item_b": "/36-35-lady-weaver/"}
    result = link_to_known_blocker(conflict)
    assert result.get("known_blocker_id") is not None
    # should link to BLOCK-H04

================================================================================
SECTION 3 — STAGE 3 TESTS (SGC-2 Modules)
================================================================================

── TEST: sgc2_sensitive_guard ────────────────────────────────────────────────
File: tests/test_sgc2_sensitive_guard.py

  TC-9.1: is_sensitive — detects defamation registry
    assert is_sensitive("DEFAMATION_RISK_REGISTRY_VOL1.txt") == True

  TC-9.2: is_sensitive — safe file
    assert is_sensitive("FINAL_CANON_AUTHORITY_REGISTRY_5_12_26.txt") == False

  TC-9.3: is_sensitive — case-insensitive detection
    assert is_sensitive("defamation_risk_registry_vol1.txt") == True

  TC-9.4: partition_file_list — separates correctly
    raw = [
        {"filename": "DEFAMATION_RISK_REGISTRY_VOL1.txt"},
        {"filename": "FINAL_CANON_AUTHORITY_REGISTRY_5_12_26.txt"},
        {"filename": "CREATOR_INTERVIEW_TRANSCRIPT_CLAUDE.txt"},
    ]
    safe, sensitive = partition_file_list(raw)
    assert len(safe) == 1
    assert len(sensitive) == 2
    assert safe[0]["filename"] == "FINAL_CANON_AUTHORITY_REGISTRY_5_12_26.txt"

  TC-9.5: verify_sensitive_not_opened — clean log passes
    verify_sensitive_not_opened(sensitive_files, opened_files_log=[], logger=mock)
    PASS condition: no exception

  TC-9.6: verify_sensitive_not_opened — dirty log aborts
    sensitive = [{"filepath": "/path/DEFAMATION_RISK_REGISTRY_VOL1.txt"}]
    opened = ["/path/DEFAMATION_RISK_REGISTRY_VOL1.txt"]
    try:
        verify_sensitive_not_opened(sensitive, opened, mock_logger)
        FAIL — should abort
    except SystemExit:
        PASS

── TEST: sgc2_zip_inspector ──────────────────────────────────────────────────
File: tests/test_sgc2_zip_inspector.py
NOTE: Tests use a small synthetic test ZIP created at test startup.

  TC-10.1: inspect_zip — returns member list
    result = inspect_zip("tests/fixtures/test.zip", max_depth=2, current_depth=0, logger=mock)
    assert "members" in result
    assert result["member_count"] > 0

  TC-10.2: inspect_zip — extracts manifest text
    result = inspect_zip("tests/fixtures/test_with_manifest.zip", ...)
    assert result["manifest_text"] is not None
    assert len(result["manifest_text"]) > 0

  TC-10.3: inspect_zip — corrupt ZIP returns error flag
    result = inspect_zip("tests/fixtures/corrupt.zip", ...)
    NOTE: corrupt.zip is a 10-byte file with invalid ZIP header.
    Expect: result is empty dict OR exception caught and error flag set.
    PASS condition: no uncaught exception, logged as CORRUPT_ZIP

  TC-10.4: detect_missing_manifests — detects absent manifest
    zip_list = [{"filepath": "test.zip", "manifest_text": None}]
    missing = detect_missing_manifests(zip_list)
    assert "test.zip" in missing

── TEST: sgc2_folder_validator ───────────────────────────────────────────────
File: tests/test_sgc2_folder_validator.py

  TC-11.1: validate_structure — all folders present
    folder_list = [{"folder_name": f} for f in EXPECTED_FOLDERS]
    result = validate_structure(folder_list, root_path=".")
    assert len(result["missing"]) == 0
    assert len(result["present"]) == len(EXPECTED_FOLDERS)

  TC-11.2: validate_structure — missing folder detected
    folder_list = [{"folder_name": "02_CANON"}]  # only one present
    result = validate_structure(folder_list, root_path=".")
    assert "00_OPERATIONAL_RULES" in result["missing"]

  TC-11.3: validate_file_placement — correct placement
    file_item = {"filename": "FINAL_CANON_AUTHORITY_REGISTRY_5_12_26.txt",
                 "parent_folder": "02_CANON",
                 "category": ["CANON"]}
    result = validate_file_placement(file_item, EXPECTED_FOLDERS)
    assert result == "CORRECT"

  TC-11.4: validate_file_placement — misplaced file
    file_item = {"filename": "SONG_LIST_1_AIO.html",
                 "parent_folder": "02_CANON",
                 "category": ["HTML"]}
    result = validate_file_placement(file_item, EXPECTED_FOLDERS)
    assert result == "MISPLACED"

================================================================================
SECTION 4 — INTEGRATION TESTS (Full Dry Run)
================================================================================

── TEST: SGC-1 Full Dry Run ───────────────────────────────────────────────────
File: tests/test_sgc1_integration_dry_run.py

  TC-12.1: Dry run completes without error
    Run: python -m sgc1.sgc1_main --mode DRY_RUN
         --seed-file tests/fixtures/test_url_registry.txt
         --output-dir tests/outputs/
    Expected:
      Exit code 0
      Dry run report exists in /logs/
      No files in /outputs/ (dry run writes nothing there)
    PASS condition: exit 0, report file exists, /outputs/ empty

  TC-12.2: Dry run report has correct content
    Open the dry run report from TC-12.1.
    assert "WHAT WOULD HAVE HAPPENED" in content
    assert "FILES THAT WOULD BE WRITTEN" in content
    assert "TO EXECUTE:" in content

  TC-12.3: Dry run does not make network calls
    Using mock, patch requests.head to raise an exception with a marker.
    Run dry run. Confirm the exception was never raised.
    PASS condition: mock never called (no network calls in dry run)

── TEST: SGC-2 Full Dry Run ───────────────────────────────────────────────────
File: tests/test_sgc2_integration_dry_run.py

  TC-13.1: Dry run on test fixture directory completes without error
    Create tests/fixtures/test_repo/ with a few test TXT and ZIP files.
    Run: python -m sgc2.sgc2_main --mode DRY_RUN
         --root tests/fixtures/test_repo/
         --output-dir tests/outputs/
    Expected: exit 0, dry run report in /logs/, /outputs/ empty.

  TC-13.2: Sensitive file detection in dry run
    Add a file named "DEFAMATION_RISK_REGISTRY_VOL1.txt" to test_repo/
    Run dry run. In report:
    assert "SENSITIVE" in report_content
    assert "DEFAMATION_RISK_REGISTRY_VOL1.txt" in report_content
    assert "not opened" in report_content.lower()

  TC-13.3: Source file integrity maintained
    Record mtime of all files in test_repo/ before run.
    Run SGC-2 (dry or live on test fixture).
    Confirm all mtimes unchanged after run.

================================================================================
SECTION 5 — SAFETY INVARIANT TESTS
================================================================================

These tests verify the safety invariants from DRY_RUN_AND_LIVE_RUN_SAFETY_MATRIX.
Run these after every LIVE_RUN on the real project.

  TC-14.1: No writes outside /outputs/ or /logs/
    After any run:
    Check all new files on disk since run started.
    assert all new files are under /outputs/ or /logs/
    PASS condition: no unexpected new files anywhere else

  TC-14.2: SGC-1 run_log contains SITE_INTEGRITY_VERIFIED
    Load SGC1_LIVE_SITE_SNAPSHOT_{run_id}.json
    run_log = output["run_log"]
    messages = [e["message"] for e in run_log]
    assert any("SITE_INTEGRITY_VERIFIED" in m for m in messages)

  TC-14.3: SGC-2 run_log contains REPO_INTEGRITY_VERIFIED and SENSITIVE_FILES_CLEAN
    Load SGC2_REPO_INVENTORY_{run_id}.json
    run_log = output["run_log"]
    messages = [e["message"] for e in run_log]
    assert any("REPO_INTEGRITY_VERIFIED" in m for m in messages)
    assert any("SENSITIVE_FILES_CLEAN" in m for m in messages)

  TC-14.4: All output files are non-zero size
    For each file in /outputs/ from this run:
    assert os.path.getsize(filepath) > 1000  # at minimum 1KB

  TC-14.5: Output JSON is valid and schema-compliant
    with open(json_output_path) as f:
        data = json.load(f)
    assert "run_id" in data
    assert "items" in data
    assert "conflicts" in data
    assert "flags" in data
    assert "run_log" in data
    assert isinstance(data["items"], list)

================================================================================
SECTION 6 — TEST FIXTURE REQUIREMENTS
================================================================================

Mixed Claude must create these test fixtures before running tests:

  tests/fixtures/
  ├── test_url_registry.txt          Small MASTER_URL_AUTHORITY_REGISTRY subset
  │                                  Include: 10 LIVE ✅, 2 BROKEN ❌, 3 PENDING ⏳, 1 DEV 🔧
  ├── test_nav_reference.txt         Small FINAL_NAVIGATION_AUTHORITY_MAP subset
  │                                  Include: 3 sections, 2 drifted chapters
  ├── test_repo/                     Minimal project repo for SGC-2 tests
  │   ├── 02_CANON/
  │   │   └── FINAL_CANON_AUTHORITY_REGISTRY_FRAGMENT_5_12_26.txt
  │   ├── 03_RATINGS/
  │   │   └── MASTER_CONTENT_RATINGS_INDEX_FRAGMENT_VOL3_5_12_26.txt
  │   ├── DEFAMATION_RISK_REGISTRY_VOL1.txt  ← sensitive file — never open
  │   └── test_archive.zip
  ├── test.zip                       Valid ZIP with 3 TXT members (no manifest)
  ├── test_with_manifest.zip         Valid ZIP with manifest TXT + 2 members
  └── corrupt.zip                    10-byte invalid file (not a real ZIP)

Fixture creation:
  tests/create_fixtures.py — a one-time setup script that creates all fixtures.
  Run before first test execution: python tests/create_fixtures.py

================================================================================
SECTION 7 — TEST EXECUTION ORDER
================================================================================

Mixed Claude runs tests in this sequence after each build stage:

STAGE 1 BUILD COMPLETE → run tests 1.x through 4.x
  python tests/test_shared_config.py
  python tests/test_shared_naming_validator.py
  python tests/test_shared_output_writer.py
  python tests/test_shared_logger.py
  ALL MUST PASS before proceeding to Stage 2.

STAGE 2 BUILD COMPLETE → run tests 5.x through 8.x
  python tests/test_sgc1_url_normalizer.py
  python tests/test_sgc1_http_requester.py
  python tests/test_sgc1_routing_classifier.py
  python tests/test_sgc1_conflict_detector.py
  ALL MUST PASS before proceeding to integration.

SGC-1 ASSEMBLED → run integration tests 12.x
  python tests/test_sgc1_integration_dry_run.py
  ALL MUST PASS before beginning Stage 3.

STAGE 3 BUILD COMPLETE → run tests 9.x through 11.x
  python tests/test_sgc2_sensitive_guard.py
  python tests/test_sgc2_zip_inspector.py
  python tests/test_sgc2_folder_validator.py
  ALL MUST PASS.

SGC-2 ASSEMBLED → run integration tests 13.x
  python tests/test_sgc2_integration_dry_run.py
  ALL MUST PASS.

AFTER FIRST LIVE_RUN → run safety invariant tests 14.x
  python tests/test_safety_invariants.py
  ALL MUST PASS.

FAILURE PROTOCOL:
  Any test failure → STOP. Fix the issue. Re-run failed test. Re-run all
  tests in that stage. Only proceed when all pass.
  Never proceed with a known test failure.

================================================================================
END SELF_GATHERING_CODE_TESTING_STRATEGY_5_12_26.txt
================================================================================
