================================================================================
FINAL_MODULE_DEPENDENCY_VALIDATION_5_12_26.txt
CollabORhythm / Collabtunes — Engineering Hardening Phase
Generated: 5.12.26 | Black Claude — Final Engineering Hardening
PURPOSE: Validate every module-to-module dependency in both SGC codes.
         Confirm: no circular deps, no missing imports, no ambiguous interfaces.
         Mixed Claude reads this before writing a single import statement.
STATUS: ENGINEERING PREP — deterministic reference, not theory
================================================================================

================================================================================
SECTION 1 — DEPENDENCY VALIDATION RULES
================================================================================

RULE DV-01: No module may import from another module in a different code group
  (SGC-1 modules must not import SGC-2 modules and vice versa)
  Exception: both may import from /shared/

RULE DV-02: No circular dependencies permitted at any level
  If A imports B, B must not import A (direct)
  If A imports B imports C, C must not import A (transitive)

RULE DV-03: Every function referenced across module boundaries must be
  explicitly listed in the exporting module's PUBLIC INTERFACE below.
  Private functions (prefixed _) are not part of the interface.

RULE DV-04: All shared module constants are READ-ONLY after init.
  No module modifies shared_config values at runtime.
  Configuration is set once at startup — not changed mid-run.

RULE DV-05: opened_files_log in sgc2_txt_scanner.py is the ONLY
  mutable cross-module state. It is read by sgc2_main.py only.
  No other module holds mutable state accessible to another module.

================================================================================
SECTION 2 — SHARED MODULE PUBLIC INTERFACES
================================================================================

── shared_config.py ──────────────────────────────────────────────────────────
IMPORTS: none
EXPORTS (all are constants — read-only):
  ALLOWED_OUTPUT_ROOT          str
  ALLOWED_LOG_ROOT             str
  SENSITIVE_FILENAME_PATTERNS  list[str]
  BANNED_FILENAMES             list[str]
  APPROVED_CATEGORIES          list[str]
  PAGE_TYPES                   list[str]
  AUTHORITY_LEVELS             list[str]
  RATE_LIMIT_SECONDS           float   (1.5)
  BODY_RATE_LIMIT_SECONDS      float   (2.0)
  DEFAULT_MAX_PAGES            int     (200)
  DEFAULT_MAX_ZIP_DEPTH        int     (2)
  TXT_HEADER_SCAN_CHARS        int     (2000)
  TXT_FULL_READ_MAX_BYTES      int     (50000)
CIRCULAR RISK: NONE (no imports)
VALIDATION: ✅ CLEAN

── shared_logger.py ──────────────────────────────────────────────────────────
IMPORTS: shared_config
EXPORTS:
  class RunLogger:
    __init__(run_id: str, mode: str, log_dir: str) → None
    log(step: str, message: str, severity: str = "INFO") → None
    log_error(step: str, error_code: str, message: str) → None
    log_abort(reason: str) → NoReturn   (raises SystemExit)
    get_log() → list[dict]
    write_log_to_file() → None
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── shared_output_writer.py ───────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  validate_output_path(filepath: str) → bool   (raises AssertionError if unsafe)
  write_json(filepath: str, data: dict, logger: RunLogger) → None
  write_txt(filepath: str, text: str, logger: RunLogger) → None
  generate_unique_filename(base: str, ext: str, dir: str) → str
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── shared_checkpoint.py ──────────────────────────────────────────────────────
IMPORTS: shared_config, shared_output_writer, shared_logger
EXPORTS:
  write_checkpoint(run_id: str, phase: str, data: dict, log_dir: str) → None
  read_checkpoint(checkpoint_filepath: str) → dict
  detect_partial_run(log_dir: str, run_prefix: str) → list[str]
  checkpoint_phase_key(phase_number: int) → str
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── shared_naming_validator.py ────────────────────────────────────────────────
IMPORTS: shared_config
EXPORTS:
  DATE_PATTERN          re.Pattern  (compiled regex — read-only)
  VOL_PATTERN           re.Pattern
  AUTHORITY_TAG_PATTERN re.Pattern
  parse_filename(filename: str) → dict
    returns: {file_count, category, date_token, vol_number,
              authority_tag, project_tag, purpose}
  validate_filename(filename: str) → dict
    returns: {compliant: bool, violations: list[str]}
  infer_authority_from_filename(filename: str) → str
    returns: one of AUTHORITY_LEVELS
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

================================================================================
SECTION 3 — SGC-1 MODULE PUBLIC INTERFACES
================================================================================

── sgc1_args.py ──────────────────────────────────────────────────────────────
IMPORTS: argparse, shared_config
EXPORTS:
  parse_args() → argparse.Namespace
    .mode: str             ("DRY_RUN" | "LIVE_RUN")
    .include_x: bool
    .max_pages: int
    .output_dir: str
    .seed_file: str
    .nav_file: str
    .resume: str | None
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_seed_loader.py ───────────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  load_seed_urls(filepath: str) → list[dict]
    each dict: {url, slug, expected_status, label, rating, section, nav_flag}
  filter_for_crawl(seed_urls: list[dict], include_x: bool) → list[dict]
  load_nav_reference(filepath: str) → dict
    returns: {sections, chapter_drift_map, routing_logic}
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_url_normalizer.py ────────────────────────────────────────────────────
IMPORTS: urllib.parse, shared_config
EXPORTS:
  BASE_URL: str  ("https://collabtunes.com")
  normalize(url: str) → str
  to_slug(url: str) → str
  classify_page_type(url: str, label: str = "") → str
    returns: one of PAGE_TYPES
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_http_requester.py ────────────────────────────────────────────────────
IMPORTS: requests, time, shared_config, shared_logger
EXPORTS:
  head_request(url: str, timeout: int, logger: RunLogger) → dict
    returns: {status_code, redirect_url, response_time, error}
  get_request(url: str, timeout: int, logger: RunLogger) → dict
    returns: {status_code, content, response_time, error}
  batch_head_requests(urls: list[str], rate_limit: float,
                      logger: RunLogger) → list[dict]
INTERNAL GUARD: asserts method in ("GET", "HEAD") before every request
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_nav_crossref.py ──────────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  parse_nav_sources(live_capture_data: dict,
                    nav_reference: dict) → dict[str, set]
    returns: {"homepage": set, "128-nav": set, "quicklinks": set}
  crossref_url(url: str, nav_source_urls: dict) → list[str]
  detect_orphans(results: list[dict],
                 nav_source_urls: dict) → list[str]
  detect_nav_duplicates(nav_source_urls: dict) → list[dict]
  detect_cross_source_slug_mismatches(nav_source_urls: dict,
                                      label_map: dict) → list[dict]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_body_parser.py ───────────────────────────────────────────────────────
IMPORTS: bs4 (BeautifulSoup), lxml, shared_config
EXPORTS:
  needs_body_crawl(result_item: dict,
                   prior_captured_set: set) → bool
  extract_body_data(html_content: str, url: str) → dict
    returns: {page_title, h1_text, meta_description, internal_links,
              word_count, has_rating_badge, has_js_gate, visible_text_excerpt}
  detect_js_rendered(html_content: str) → bool
  extract_internal_links(soup: BeautifulSoup, base_url: str) → list[str]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_routing_classifier.py ────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  assign_rating(url: str, label: str,
                canon_data: dict, ratings_data: dict) → str
  assign_gate(rating: str) → str
  assign_safe_route(rating: str, has_gate_in_body: bool) → str
  detect_gate_missing(result_item: dict) → bool
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_conflict_detector.py ─────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  KNOWN_BLOCKERS: dict   (read-only mapping of conflict IDs to blocker data)
  detect_status_mismatch(result_item: dict,
                         expected_status: str) → dict | None
  detect_chapter_drift(url: str,
                       nav_label_number: int) → dict | None
  build_chapter_drift_map(results: list[dict]) → list[dict]
  link_to_known_blocker(conflict: dict) → dict
  deduplicate_conflicts(conflicts: list[dict]) → list[dict]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_deduplicator.py ──────────────────────────────────────────────────────
IMPORTS: shared_config
EXPORTS:
  find_url_duplicates(results: list[dict]) → list[list[dict]]
  designate_canonical(duplicate_group: list[dict],
                      url_registry: dict) → list[dict]
  remove_result_duplicates(results: list[dict]) → list[dict]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_exporter.py ──────────────────────────────────────────────────────────
IMPORTS: json, shared_output_writer, shared_config, shared_logger
EXPORTS:
  build_json_output(results: list[dict], conflicts: list[dict],
                    flags: list[dict], run_log: list[dict],
                    run_meta: dict) → dict
  build_txt_summary(results: list[dict], conflicts: list[dict],
                    flags: list[dict], run_meta: dict) → str
  build_manifest(results: list[dict], conflicts: list[dict],
                 flags: list[dict], output_files: list[str],
                 run_meta: dict) → str
  write_all(json_data: dict, txt_data: str, manifest_data: str,
            run_id: str, output_dir: str, logger: RunLogger) → None
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc1_main.py (ORCHESTRATOR) ───────────────────────────────────────────────
IMPORTS: sgc1_args, sgc1_seed_loader, sgc1_url_normalizer,
         sgc1_http_requester, sgc1_nav_crossref, sgc1_body_parser,
         sgc1_routing_classifier, sgc1_conflict_detector,
         sgc1_deduplicator, sgc1_exporter,
         shared_config, shared_logger, shared_checkpoint, shared_output_writer
EXPORTS: main() → None
CIRCULAR RISK: NONE — orchestrator is not imported by any other module
VALIDATION: ✅ CLEAN

================================================================================
SECTION 4 — SGC-2 MODULE PUBLIC INTERFACES
================================================================================

── sgc2_args.py ──────────────────────────────────────────────────────────────
IMPORTS: argparse, shared_config
EXPORTS:
  parse_args() → argparse.Namespace
    .mode: str
    .root: str
    .output_dir: str
    .max_zip_depth: int
    .skip_sensitive: bool
    .resume: str | None
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc2_dir_walker.py ────────────────────────────────────────────────────────
IMPORTS: os, os.path, stat, shared_config, shared_logger
EXPORTS:
  walk_directory(root_path: str) → tuple[list[dict], list[dict]]
    returns: (raw_file_list, folder_list)
    each file dict: {filepath, filename, extension, size_bytes,
                     mtime, parent_folder, file_type}
    each folder dict: {folderpath, folder_name, item_count}
  record_mtime_baseline(raw_file_list: list[dict]) → dict[str, float]
  verify_mtimes_unchanged(baseline: dict, logger: RunLogger) → list[str]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc2_sensitive_guard.py ───────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  is_sensitive(filename: str) → bool
  partition_file_list(raw_file_list: list[dict]) → tuple[list[dict], list[dict]]
    returns: (safe_files, sensitive_files)
  verify_sensitive_not_opened(sensitive_files: list[dict],
                               opened_files_log: list[str],
                               logger: RunLogger) → None
    raises SystemExit if any sensitive file was opened
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN
CRITICAL NOTE: This module must be called BEFORE any file reading logic.
               Ordering enforced in sgc2_main.py — partition runs before
               txt_scanner or zip_inspector process anything.

── sgc2_filename_parser.py ───────────────────────────────────────────────────
IMPORTS: shared_naming_validator, shared_config
EXPORTS:
  parse_all(safe_file_list: list[dict]) → list[dict]
    enriches each item with: category[], date_token, vol_number,
    authority_tag, project_tag, purpose, compliant, violations[]
  check_compliance(safe_file_list: list[dict]) → list[dict]
    returns: violations[] list
  infer_authority(file_item: dict) → str
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc2_zip_inspector.py ─────────────────────────────────────────────────────
IMPORTS: zipfile, shared_config, shared_logger
EXPORTS:
  inspect_zip(filepath: str, max_depth: int,
              current_depth: int, logger: RunLogger) → dict
    returns: {member_count, members[], manifest_text, nested_zips[]}
  parse_manifest_text(manifest_text: str) → dict
  crossref_manifest_vs_members(manifest_data: dict,
                                members: list[dict]) → list[dict]
  detect_missing_manifests(zip_list: list[dict]) → list[str]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN
CONSTRAINT: Opens ZIPs in read-only mode ('r') only — never write mode.

── sgc2_txt_scanner.py ───────────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  opened_files_log: list[str]   ← MUTABLE — accessed by sgc2_main only
  scan_txt(filepath: str, logger: RunLogger) → dict
    returns: {doc_title, status_line, generated_line, purpose_line,
              supersedes_refs[], prior_version_refs[], authority}
  extract_authority_from_status(status_line: str) → str
  build_supersedes_graph(txt_metadata_list: list[dict]) → dict
    returns: {filepath: {supersedes[], superseded_by}}
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN
STATE NOTE: opened_files_log is the only cross-module mutable state in the
            entire codebase. sgc2_main reads it once at end of run for
            integrity verification. No other module touches it.

── sgc2_folder_validator.py ──────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  EXPECTED_FOLDERS: list[str]   (read-only — from REPOSITORY_STRUCTURE_PLAN)
  validate_structure(folder_list: list[dict],
                     root_path: str) → dict
    returns: {present[], missing[], unexpected[]}
  validate_file_placement(file_item: dict,
                          expected_folders: list[str]) → str
    returns: "CORRECT" | "MISPLACED" | "UNEXPECTED_FOLDER"
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc2_dependency_mapper.py ─────────────────────────────────────────────────
IMPORTS: shared_config, shared_logger
EXPORTS:
  DEPENDENCY_CHAINS: dict       (read-only — defines 5 known chains)
  GENERATOR_REQUIRED_INPUTS: list[str]  (read-only)
  assess_chain(chain_name: str,
               file_inventory: list[dict]) → dict
    returns: {status: "COMPLETE"|"MISSING"|"BLOCKED", missing_files[]}
  assess_generator_inputs(file_inventory: list[dict]) → list[dict]
    returns: [{input_name, file_found, authority, status}]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc2_deduplicator.py ──────────────────────────────────────────────────────
IMPORTS: shared_config, shared_naming_validator
EXPORTS:
  group_by_base_name(file_list: list[dict]) → dict[str, list[dict]]
  tag_authority_within_group(group: list[dict]) → list[dict]
  build_version_chains(groups: dict) → list[dict]
    each chain: {base_name, versions[], current}
  find_orphan_files(file_list: list[dict]) → list[dict]
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc2_exporter.py ──────────────────────────────────────────────────────────
IMPORTS: json, shared_output_writer, shared_config, shared_logger
EXPORTS:
  build_json_output(files, folders, zips, chains, dep_chains,
                    gen_inputs, conflicts, flags, sensitive_files,
                    run_log, run_meta) → dict
  build_txt_summary(...same args...) → str
  build_manifest(counts: dict, output_files: list[str],
                 run_meta: dict, top_flags: list[dict]) → str
  write_all(json_data: dict, txt_data: str, manifest_data: str,
            run_id: str, output_dir: str, logger: RunLogger) → None
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── sgc2_main.py (ORCHESTRATOR) ───────────────────────────────────────────────
IMPORTS: sgc2_args, sgc2_dir_walker, sgc2_sensitive_guard,
         sgc2_filename_parser, sgc2_zip_inspector, sgc2_txt_scanner,
         sgc2_folder_validator, sgc2_dependency_mapper,
         sgc2_deduplicator, sgc2_exporter,
         shared_config, shared_logger, shared_checkpoint, shared_output_writer
EXPORTS: main() → None
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

── run_both.py ───────────────────────────────────────────────────────────────
IMPORTS: sgc2_main, sgc1_main
EXPORTS: run_both() → None   (runs SGC-2 first, then SGC-1)
CIRCULAR RISK: NONE
VALIDATION: ✅ CLEAN

================================================================================
SECTION 5 — CIRCULAR DEPENDENCY PROOF
================================================================================

Full transitive closure check — verified by tracing every import chain:

shared_config         → (nothing) ✅
shared_logger         → shared_config ✅
shared_output_writer  → shared_config → shared_logger → shared_config [LOOP?]
  ANALYSIS: shared_logger imports shared_config.
            shared_output_writer imports shared_config AND shared_logger.
            shared_config imports nothing.
            shared_logger imports only shared_config.
            RESULT: No loop. shared_config is a leaf node. ✅

shared_checkpoint     → shared_config, shared_output_writer, shared_logger
  ANALYSIS: All three terminate at shared_config (leaf). ✅

shared_naming_validator → shared_config ✅

All SGC-1 modules → shared_config and/or shared_logger only
  ANALYSIS: No SGC-1 module imports another SGC-1 module
            (except sgc1_main which imports all — and is never imported). ✅
  EXCEPTION CHECK: sgc1_exporter imports shared_output_writer.
    shared_output_writer imports shared_logger.
    shared_logger imports shared_config.
    shared_config imports nothing. CHAIN TERMINATES. ✅

All SGC-2 modules → shared_config and/or shared_logger only
  sgc2_filename_parser additionally imports shared_naming_validator.
    shared_naming_validator imports shared_config. TERMINATES. ✅
  sgc2_deduplicator additionally imports shared_naming_validator.
    Same chain. TERMINATES. ✅

RESULT: ZERO circular dependencies in the entire module graph. ✅

================================================================================
SECTION 6 — INTERFACE CONTRACT RULES FOR MIXED CLAUDE
================================================================================

RULE IC-01: Do not add new exports to a module without updating this file.
  This file is the interface contract. Deviations create drift.

RULE IC-02: Function signatures must match exactly as listed above.
  Parameter names, types, and return types are fixed contracts.
  Rename nothing without updating all callers.

RULE IC-03: opened_files_log (sgc2_txt_scanner) is the only permitted
  module-level mutable list. All other module-level state is read-only.
  Do not introduce new mutable module-level state.

RULE IC-04: KNOWN_BLOCKERS (sgc1_conflict_detector) and DEPENDENCY_CHAINS
  (sgc2_dependency_mapper) are hardcoded from the authority documents.
  They are dicts, not config files. Values must match:
    KNOWN_BLOCKERS ← FINAL_CANON_AUTHORITY_REGISTRY open conflicts section
    DEPENDENCY_CHAINS ← CROSS_SYSTEM_DEPENDENCY_MAP_5_12_26.txt chains A–E
    GENERATOR_REQUIRED_INPUTS ← GENERATOR_INPUT_READINESS_REPORT_5_12_26.txt
    EXPECTED_FOLDERS ← REPOSITORY_STRUCTURE_PLAN_5_12_26.txt folder list

RULE IC-05: All log_abort() calls terminate the program via SystemExit.
  Do not catch SystemExit anywhere except in a test harness.
  If a module needs to signal a critical failure, it calls logger.log_abort().
  It does not return or raise any other exception for critical failures.

RULE IC-06: validate_output_path() is called before EVERY file write.
  No module writes a file directly — all writes go through shared_output_writer.
  This is the safety invariant for overwrite prevention.

================================================================================
SECTION 7 — VALIDATION SUMMARY
================================================================================

Total modules:                    28
Modules with circular deps:        0  ✅
Modules with missing interfaces:   0  ✅
Modules violating DV-01:           0  ✅
Modules with mutable shared state: 1  (sgc2_txt_scanner.opened_files_log — ACCEPTABLE)
Modules importing outside group:   0  ✅  (all cross-group via /shared/ only)
Orchestrators imported by others:  0  ✅  (sgc1_main, sgc2_main, run_both — never imported)

VERDICT: Module dependency graph is valid, acyclic, and safe to code from.

================================================================================
END FINAL_MODULE_DEPENDENCY_VALIDATION_5_12_26.txt
================================================================================
