CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

CRITICAL: Efficiency and Minimal Documentation

KEEP IT LEAN: FILE MANAGEMENT: END OF SESSION CLEANUP:

CRITICAL: Windows Environment

This project runs on Windows. You MUST follow these guidelines:

Python Command

Unicode Handling

File Paths

Project Overview

This is a Region Redistribution Project for Step Up For Students, a scholarship funding organization. The goal is to equitably redistribute 2,617 Florida private schools across Regional Managers following staff changes.

Critical Context: Redistribution Methodology (Final Implementation):

The final implementation uses a single-pass redistribution with all 20 RMs (15 existing + 5 SrRMs + 3 new RMs) and predetermined new RM locations based on prior gap analysis:

The algorithm subdivides large counties FIRST (DADE, BROWARD, ORANGE, HILLSBOROUGH), then assigns remaining counties whole, ensuring 100% geographic contiguity and 80%+ balance rate.

NOTE: Original methodology planned a 4-stage process (baseline with 17 RMs → gap analysis → determine new locations → final with 20 RMs), but the final implementation consolidated this into a single redistribution pass with predetermined placements.

Plain-English Summary (ELI5)

What we're fixing: Some RMs have too many schools; some drive too far. We're redrawing the map so work and driving are fair.

How we split the map: Group nearby schools by zip code chunks. Each RM gets chunks that touch each other (contiguous territories). No zip code is ever split between RMs - this makes the map clean and easy to visualize.

What "fair" means: Give everyone about the same number of schools, adjusted for Senior RMs who carry fewer by design. Close to equal is the goal.

Driving rule: Keep every school within 75 miles. If impossible, allow up to 100 miles with documented exception and smart batching plan.

Visits: Everyone does 52 visits a quarter. Batch trips when possible. Repeat visits count.

Special case: Assistant Director covers Hialeah + Hialeah Gardens only (92 schools).

Keeping the peace: Preserve existing relationships where possible, but fairness and distance come first. Changes explained clearly by zip areas.

Why this works: Zip-based areas are easy to explain and map. Equal school counts feel fair. The 75/100-mile rule balances travel feasibility with coverage needs. Rural folks get flexibility with documented exceptions. All RMs handle new school onboarding (essential job skill).

Key Benefits

The final redistribution delivers measurable operational improvements:

1. Realistic Travel Planning

2. Significant Travel Reduction

30.6% reduction in 100-mile roundtrips (trips over 50 miles one-way): Operational impact:

3. Improved Visit Batching Efficiency

With shorter average distances and more compact territories:

4. Zero Cross-Territory Travel

100% contiguity means RMs never drive through another RM's territory to reach their own schools:

5. Clean Mapping (Every ZIP = One RM)

Each 5-digit ZIP code belongs to exactly one RM:

Data Files

Core Data Files

schools.csv - Primary school database (2,617 schools) rms.csv - Regional Manager roster - RM: Standard Regional Manager (fte_factor = 1.0, aspirational target 150 schools, range 130-170) - SrRM: Senior Regional Manager (fte_factor = 0.6, aspirational target 90 schools, range 78-102, reduced caseload by design) - AD: Assistant Director (fte_factor = blank, Hialeah-only assignment) distances.csv - RM-to-school road miles

Redistribution Model Logic

Geographic Assignment Rules (CRITICAL FOR MAPPING)

Zip Code Exclusivity: Contiguity Requirements (NON-NEGOTIABLE): - If a county is assigned whole to an RM, it must be adjacent to that RM's existing territory (or be their starting seed) - If a zip is reassigned, it must be on the border of the receiving RM's territory AND adjacent to at least one zip already in their territory

Assignment Hierarchy (Preference Order): 1. Prefer Whole Counties: Assign entire counties as single units whenever possible - Whole county assignments are cleaner, easier to explain, and more stable - Only break down a county into zips when necessary for balance or contiguity 2. Break Counties into Zips When Needed: - Top 4 counties by school count (DADE, BROWARD, ORANGE, HILLSBOROUGH) subdivided at ZIP level - These are the only counties subdivided in the final implementation - Counties may be broken down mid-optimization if needed to achieve balance - When breaking a county, maintain contiguity for all affected territories 3. Preserve Contiguity in All Cases: - Before any zip reassignment, verify it maintains contiguity for both donor and recipient RMs - A zip can only be removed from an RM if doing so doesn't fragment their remaining territory - A zip can only be added to an RM if it's adjacent to their existing territory

Contiguity Validation:

Key Constraints (Non-Negotiable)

1. Distance Rule: 125-mile maximum constraint (hard limit); 75 miles preferred but not enforced 2. Visit Floor: 52 school visits per quarter per RM (repeat visits count) - this is set by leadership and non-negotiable 3. Carve-out: - Assistant Director covers Hialeah + Hialeah Gardens only (92 schools) 4. Contiguity: Each RM territory must be geographically contiguous (county-level validation)

Equity Definition

Algorithm Stages (Final Implementation)

NOTE: The final implementation uses a single-pass redistribution with all 20 RMs (15 existing + 5 SrRMs + 3 new RMs) and predetermined new RM locations (St Petersburg, Kissimmee, Lakeland).

1. Setup & Data Loading: - Load 20 field RMs (15 Standard @ 1.0 FTE + 5 Senior @ 0.6 FTE) + 1 AD - Load 2,617 schools across Florida - Remove Hialeah + Hialeah Gardens carve-out (92 schools → AD Carla Hernandez) - Calculate distributable schools: 2,525 schools - Calculate total FTE: 18.0 (15 × 1.0 + 5 × 0.6, excluding AD) - Calculate schools per FTE: 2,525 ÷ 18.0 = 140.3 per FTE (calculated target) - Set aspirational targets: 150 schools for RMs, 90 schools for SrRMs (for presentations/goals) - Set calculated FTE-proportional targets: 140 for RMs (140.3 × 1.0), 84 for SrRMs (140.3 × 0.6) - Tolerance ranges: 130-170 for RMs (±20 around aspirational 150), 78-102 for SrRMs (±12 around aspirational 90)

2. Build Adjacency Data: - Build county adjacency matrix for contiguity validation (all 66 FL counties) - Build ZIP adjacency ONLY for the 4 subdivided counties (DADE, BROWARD, ORANGE, HILLSBOROUGH) - ZIP adjacency not needed for the 62 counties assigned whole - Adjacency data used for contiguity validation throughout assignment

3. Subdivide Tier 1 Counties (FIRST - reverse of original plan): - Tier 1 Counties (MUST subdivide): DADE, BROWARD, ORANGE, HILLSBOROUGH - For each Tier 1 county: - Identify RMs with home ZIP in that county - Seed each RM with their home ZIP - Grow territories contiguously using nearest-neighbor logic - Assign adjacent ZIPs to RM furthest below their target - Balance using FTE-proportional targets (calculated 140/84) - Maintain strict contiguity at all times (BFS/adjacency-based growth) - Result: Large counties subdivided among 2-3 RMs each with contiguous ZIP territories

4. Seed Whole Counties (Home County Assignment): - For RMs not in Tier 1 counties: - Assign RM's home county as starting territory (whole county, not subdivided) - Ensures each RM has geographic anchor in familiar region - Counties assigned as atomic units

5. Assign Remaining Whole Counties: - Assign unassigned counties (62 non-Tier-1 counties) to nearest RM with capacity - Maintain contiguity (counties must be adjacent to RM's existing territory) - Respect 125-mile distance constraint - Prefer whole county assignments for clean boundaries - Grow territories outward from seeds using greedy nearest-neighbor

6. Handle Remaining Unassigned Schools: - Address any schools in ZIPs not yet assigned (edge cases, data quality issues) - Assign to nearest RM with available capacity - Ensure 100% coverage (all 2,617 schools assigned)

7. Validate & Document: - Run county-level contiguity checks (all 21 RMs pass) - Calculate final balance: 16/20 RMs within tolerance (80% balance rate) - Document 4 RMs outside tolerance with geographic justification - Generate final metrics, visualizations, and change logs

Validation Gates

- Run graph connectivity check on each RM's zips/counties - Territory must form single connected component in adjacency graph - No disconnected islands or enclaves - Final result: 21/21 territories contiguous (100% pass rate) - Final result: 80% balance rate achieved (16/20 RMs within tolerance) - 4 RMs outside tolerance have geographic justification (rural underload, new RM territory constraints) - Tolerance: ±20 around aspirational target for RMs (130-170), ±12 for SrRMs (78-102) - 75 miles preferred but not enforced - Final result: 17/20 RMs (85%) within 125-mile constraint; 3 minor violations (125.1-133.8 mi) with geographic justification

Key Parameters

Staffing & Capacity

Targets & Tolerances

- Standard RM: 150 schools (range 130-170, ±20) - Senior RM: 90 schools (range 78-102, ±12) - Standard RM: 140 schools (140.3 × 1.0) - Senior RM: 84 schools (140.3 × 0.6)

Distance Constraints

County Subdivision

Operational

Zip-Level Assignment Logic (Subdivided Counties Only)

IMPORTANT: Zip-level assignment is ONLY used for the 4 subdivided counties (DADE, BROWARD, ORANGE, HILLSBOROUGH). The other 62 counties are assigned whole as atomic units.

For the 4 subdivided counties:

1. Identify unique 5-digit zip codes within the county 2. Count schools per zip 3. Build adjacency using Census ZCTA (ZIP Code Tabulation Area) polygons: - Use TIGER/Line ZCTAs (e.g., tl2023us_zcta520) - Clip to Florida and filter to the specific county - Build adjacency graph (Queen adjacency: polygons sharing boundary/point) 4. Assign ZIPs contiguously to RMs: - Start with RM home ZIP as seed - Grow territory by adding adjacent ZIPs - Maintain strict contiguity throughout - Balance using FTE-proportional targets 5. Result: Each subdivided county split among 2-3 RMs with contiguous ZIP territories 6. Assignment: Each ZIP (and all schools within it) assigned to exactly one RM

Note: The final implementation does NOT create a separate zipcells.csv file. ZIP assignments are embedded directly in finalrm_assignments.json.

Dynamic Change Management

Ongoing School Changes (Between Annual Redistributions)

New Schools (~30/year): School Closures (~15/year): Annual Rebalancing: Adding New RMs:

Operational Context

Visit Requirements

Relationship & History Considerations

Data Quality & Updates

Common Data Issues

1. Missing Enrollment Values: Some schools have #N/A in enrollment field 2. Geocoding: All schools need valid lat/lon; some may require geocoding from address 3. Zip Cell Construction: Must merge sparse zips (<5 schools) into adjacent zips for contiguity 4. Current Assignments: Reflect 6-month-old overload situation in Orlando corridor 5. Distance Matrix: Road miles preferred over straight-line distance 6. DOE Data Lag: School moves/changes may not be immediately reflected 7. County/ZIP Mismatches: Known data quality errors in schools.csv: - CAREER PREP ACADEMY OF SOUTH FLORIDA106: ZIP 32246 (Jacksonville) but county = DADE (should be DUVAL) - CAREER PREP ACADEMY OF NORTHSIDE: ZIP 33157 (Miami) but county = DUVAL (should be DADE) - These must be corrected in source data before redistribution

Working with This Data

Data Validation Checklist (from Data Intake Checklist)

Expected Deliverables

Core Data Files: Final Outputs: Interactive Visualizations: Documentation:

Note: The implementation does NOT create separate zipcells.csv file, exception lists with batching plans, or individual RM one-pagers. ZIP assignments are embedded in finalrm_assignments.json, and visualizations are interactive HTML rather than static documents.

API Keys & External Services

IMPORTANT: API keys are stored in .env file (not committed to repo) MapBox Directions API (current solution): Census ZCTA Data:

Documentation Structure

This repository uses interconnected Markdown documentation: