CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
CRITICAL: Efficiency and Minimal Documentation
KEEP IT LEAN:- NO interim progress summaries - Do not create files like "KRISTINFIXSUMMARY.md", "NEXTSTEPS.md", "CONTIGUITYFAILURE.md", etc.
- NO analysis scripts that duplicate work - Once analysis is done, delete the script
- NO reports on every step - Progress toward the end goal is not documentation
- ONLY keep essential files: Final data CSVs, core implementation scripts, and permanent documentation
- Work with efficiency in mind - Minimize file creation, avoid redundant analysis
- REPLACE, DO NOT VERSION - Always overwrite
final_redistribution_results.csv, never create_step1,_step2,_tuned, etc. - ONE FINAL FILE - There should only be ONE
final_redistribution_results.csvat any time - All scripts should save to the same filename, replacing the previous version
- At the end of each work session, ALWAYS prompt the user to review and delete unnecessary scripts
- List all temporary/interim scripts created during the session
- Ask user which ones to keep vs. delete before committing
CRITICAL: Windows Environment
This project runs on Windows. You MUST follow these guidelines:
Python Command
- Use
pythonNOTpython3(Windows aliases) - Python 3.13.6 is installed
Unicode Handling
- NEVER use Unicode characters in print() statements (checkmarks ✓, arrows →, emojis, etc.)
- Windows console uses CP1252 encoding which does NOT support most Unicode
- ALWAYS use plain ASCII: "OK" not "✓", "-->" not "→", "[!]" not "⚠"
- When writing to CSV files, use
encoding='utf-8'orencoding='utf-8-sig' - When reading CSV files, use
encoding='utf-8-sig'to handle BOM
File Paths
- Use forward slashes
/or double backslashes\\in paths - Absolute paths are required for most file operations
Project Overview
This is a Region Redistribution Project for Step Up For Students, a scholarship funding organization. The goal is to equitably redistribute 2,617 Florida private schools across Regional Managers following staff changes.
Critical Context:- Four regions (Volusia, Seminole, South Orange, Osceola) had coverage gaps for 6+ months, creating chronic overload
- Current roster: 17 existing RMs + 1 AD (Hialeah carve-out)
- Need to determine if/where 3 new RMs should be hired based on data-driven analysis
- This is a crowd-friendly project - outcomes must be explainable to non-technical audiences (former teachers/educators)
The final implementation uses a single-pass redistribution with all 20 RMs (15 existing + 5 SrRMs + 3 new RMs) and predetermined new RM locations based on prior gap analysis:
- New RM #1: St Petersburg, FL (ZIP 33701, PINELLAS County)
- New RM #2: Kissimmee, FL (ZIP 32808, OSCEOLA/ORANGE area)
- New RM #3: Lakeland, FL (ZIP 33805, POLK County)
The algorithm subdivides large counties FIRST (DADE, BROWARD, ORANGE, HILLSBOROUGH), then assigns remaining counties whole, ensuring 100% geographic contiguity and 80%+ balance rate.
NOTE: Original methodology planned a 4-stage process (baseline with 17 RMs → gap analysis → determine new locations → final with 20 RMs), but the final implementation consolidated this into a single redistribution pass with predetermined placements.
Plain-English Summary (ELI5)
What we're fixing: Some RMs have too many schools; some drive too far. We're redrawing the map so work and driving are fair.
How we split the map: Group nearby schools by zip code chunks. Each RM gets chunks that touch each other (contiguous territories). No zip code is ever split between RMs - this makes the map clean and easy to visualize.
What "fair" means: Give everyone about the same number of schools, adjusted for Senior RMs who carry fewer by design. Close to equal is the goal.
Driving rule: Keep every school within 75 miles. If impossible, allow up to 100 miles with documented exception and smart batching plan.
Visits: Everyone does 52 visits a quarter. Batch trips when possible. Repeat visits count.
Special case: Assistant Director covers Hialeah + Hialeah Gardens only (92 schools).
Keeping the peace: Preserve existing relationships where possible, but fairness and distance come first. Changes explained clearly by zip areas.
Why this works: Zip-based areas are easy to explain and map. Equal school counts feel fair. The 75/100-mile rule balances travel feasibility with coverage needs. Rural folks get flexibility with documented exceptions. All RMs handle new school onboarding (essential job skill).
Key Benefits
The final redistribution delivers measurable operational improvements:
1. Realistic Travel Planning
- Uses actual driving distances from MapBox Directions API, not straight-line measurements
- Every distance constraint reflects real-world road travel (highways, local roads, geographic barriers)
- Enables accurate travel time and expense planning
2. Significant Travel Reduction
30.6% reduction in 100-mile roundtrips (trips over 50 miles one-way):- Before redistribution: 520 trips over 50 miles (20.2% of all RM-school assignments)
- After redistribution: 361 trips over 50 miles (14.3% of all RM-school assignments)
- Result: 159 fewer long-distance trips requiring rental cars
- Reduced monthly travel expenses (trips under 50 miles use personal vehicle reimbursement)
- Less RM fatigue from long drives
- More sustainable workload for field staff
3. Improved Visit Batching Efficiency
With shorter average distances and more compact territories:- 4-visits-per-day becomes more achievable (schools clustered within 15-mile radius)
- Lower per-visit travel cost
- More time with schools, less time on the road
- Better quarterly visit compliance (52 visits/quarter mandatory target)
4. Zero Cross-Territory Travel
100% contiguity means RMs never drive through another RM's territory to reach their own schools:- Cleaner territorial boundaries with no enclaves or islands
- No confusion about "who should visit this school?"
- Easier for schools to identify their assigned RM
- Stable, defensible territories that make geographic sense to RMs and schools
5. Clean Mapping (Every ZIP = One RM)
Each 5-digit ZIP code belongs to exactly one RM:- Schools can easily look up their RM by ZIP code
- Territory boundaries are clear and visual on maps
- No split ZIPs creating confusion
- Enables simple ZIP-based lookup tools for schools and families
Data Files
Core Data Files
schools.csv - Primary school database (2,617 schools)- All schools are private schools participating in Step Up scholarship program
- Contains: name, DOEcode, streetaddress, streetaddress2, city, zip, county, lat, lon, enrollment, sizetier, currentrm, assignedrm, isactive
- DOE_code is the unique identifier
- Size tiers: Micro (1-30), Small (30-100), Medium (100-300), Large (300+)
- Note: Some enrollment values may be #N/A and need cleaning
- is_active added for future closure tracking (all currently true)
- assigned_rm: Final RM assignment from redistribution algorithm
- Contains: name, roletype, ftefactor, homeaddress, homecity, homezip, homelat, home_lon
- Role types:
- All RMs are full-time employees with FIXED role types (cannot be changed)
- Initial roster (before new hires): 12 Standard RMs, 5 Senior RMs, 1 AD = 18 total, 15.0 FTE capacity
- Final roster (after 3 new RM hires): 15 Standard RMs, 5 Senior RMs, 1 AD = 21 total, 18.0 FTE capacity (15 × 1.0 + 5 × 0.6, excluding AD)
- New RM locations (data-driven): St Petersburg (PINELLAS), Kissimmee (OSCEOLA/ORANGE), Lakeland (POLK)
- RM-to-school road miles for every RM-school pair within 125-mile radius
- Uses MapBox Directions API for actual driving distances
- Essential for enforcing 125-mile maximum distance constraint (75-mile preferred)
Redistribution Model Logic
Geographic Assignment Rules (CRITICAL FOR MAPPING)
Zip Code Exclusivity:- Each 5-digit zip code belongs to exactly ONE zip cell
- Each zip cell is assigned to exactly ONE RM
- No zip code can be split between multiple RMs (ensures clean graphical representation)
- During optimization, a zip can be moved from one zip cell to another, which changes its RM assignment
- This makes territories easy to visualize on a map with clean boundaries
- Every RM territory MUST form a single contiguous geographic region
- No "islands" or disconnected pockets allowed
- Contiguity is enforced at the county and zip code level:
- Adjacency defined by: shared borders (Queen adjacency - includes corner touches)
Assignment Hierarchy (Preference Order): 1. Prefer Whole Counties: Assign entire counties as single units whenever possible - Whole county assignments are cleaner, easier to explain, and more stable - Only break down a county into zips when necessary for balance or contiguity 2. Break Counties into Zips When Needed: - Top 4 counties by school count (DADE, BROWARD, ORANGE, HILLSBOROUGH) subdivided at ZIP level - These are the only counties subdivided in the final implementation - Counties may be broken down mid-optimization if needed to achieve balance - When breaking a county, maintain contiguity for all affected territories 3. Preserve Contiguity in All Cases: - Before any zip reassignment, verify it maintains contiguity for both donor and recipient RMs - A zip can only be removed from an RM if doing so doesn't fragment their remaining territory - A zip can only be added to an RM if it's adjacent to their existing territory
Contiguity Validation:- Use Census ZCTA (ZIP Code Tabulation Area) boundaries to determine adjacency
- Build adjacency matrix: For each zip pair, determine if they share a border
- For county-level adjacency, use county boundary polygons
- Before accepting any reassignment, run contiguity check on resulting territories
Key Constraints (Non-Negotiable)
1. Distance Rule: 125-mile maximum constraint (hard limit); 75 miles preferred but not enforced 2. Visit Floor: 52 school visits per quarter per RM (repeat visits count) - this is set by leadership and non-negotiable 3. Carve-out: - Assistant Director covers Hialeah + Hialeah Gardens only (92 schools) 4. Contiguity: Each RM territory must be geographically contiguous (county-level validation)
Equity Definition
- Fairness Unit: Headcount (each school = 1)
- Normalization: Sr. RM = 0.6 FTE, Standard RM = 1.0 FTE, AD excluded from equity calculations
- Target: Equal schools per FTE after removing Hialeah + Hialeah Gardens carve-out (92 schools to AD)
- Tolerance: Each RM within ±10-15 schools of FTE-normalized target
- Important: School size does NOT affect equity weighting - no up-weighting for large schools (large schools often have better finance teams and need less handholding)
Algorithm Stages (Final Implementation)
NOTE: The final implementation uses a single-pass redistribution with all 20 RMs (15 existing + 5 SrRMs + 3 new RMs) and predetermined new RM locations (St Petersburg, Kissimmee, Lakeland).
1. Setup & Data Loading: - Load 20 field RMs (15 Standard @ 1.0 FTE + 5 Senior @ 0.6 FTE) + 1 AD - Load 2,617 schools across Florida - Remove Hialeah + Hialeah Gardens carve-out (92 schools → AD Carla Hernandez) - Calculate distributable schools: 2,525 schools - Calculate total FTE: 18.0 (15 × 1.0 + 5 × 0.6, excluding AD) - Calculate schools per FTE: 2,525 ÷ 18.0 = 140.3 per FTE (calculated target) - Set aspirational targets: 150 schools for RMs, 90 schools for SrRMs (for presentations/goals) - Set calculated FTE-proportional targets: 140 for RMs (140.3 × 1.0), 84 for SrRMs (140.3 × 0.6) - Tolerance ranges: 130-170 for RMs (±20 around aspirational 150), 78-102 for SrRMs (±12 around aspirational 90)
2. Build Adjacency Data: - Build county adjacency matrix for contiguity validation (all 66 FL counties) - Build ZIP adjacency ONLY for the 4 subdivided counties (DADE, BROWARD, ORANGE, HILLSBOROUGH) - ZIP adjacency not needed for the 62 counties assigned whole - Adjacency data used for contiguity validation throughout assignment
3. Subdivide Tier 1 Counties (FIRST - reverse of original plan): - Tier 1 Counties (MUST subdivide): DADE, BROWARD, ORANGE, HILLSBOROUGH - For each Tier 1 county: - Identify RMs with home ZIP in that county - Seed each RM with their home ZIP - Grow territories contiguously using nearest-neighbor logic - Assign adjacent ZIPs to RM furthest below their target - Balance using FTE-proportional targets (calculated 140/84) - Maintain strict contiguity at all times (BFS/adjacency-based growth) - Result: Large counties subdivided among 2-3 RMs each with contiguous ZIP territories
4. Seed Whole Counties (Home County Assignment): - For RMs not in Tier 1 counties: - Assign RM's home county as starting territory (whole county, not subdivided) - Ensures each RM has geographic anchor in familiar region - Counties assigned as atomic units
5. Assign Remaining Whole Counties: - Assign unassigned counties (62 non-Tier-1 counties) to nearest RM with capacity - Maintain contiguity (counties must be adjacent to RM's existing territory) - Respect 125-mile distance constraint - Prefer whole county assignments for clean boundaries - Grow territories outward from seeds using greedy nearest-neighbor
6. Handle Remaining Unassigned Schools: - Address any schools in ZIPs not yet assigned (edge cases, data quality issues) - Assign to nearest RM with available capacity - Ensure 100% coverage (all 2,617 schools assigned)
7. Validate & Document: - Run county-level contiguity checks (all 21 RMs pass) - Calculate final balance: 16/20 RMs within tolerance (80% balance rate) - Document 4 RMs outside tolerance with geographic justification - Generate final metrics, visualizations, and change logs
Validation Gates
- Contiguity: Every RM territory forms a single contiguous region (MANDATORY - no exceptions)
- Equity: Target 80%+ balance rate (16/20 RMs within tolerance)
- Distance: 125-mile maximum constraint (hard limit)
- Coverage: 100% of schools assigned (2,617 schools, zero unassigned)
- Mapping Clarity: Every ZIP code assigned to exactly one RM (clean territorial boundaries)
Key Parameters
Staffing & Capacity
- Final roster: 15 Standard RMs (1.0 FTE) + 5 Senior RMs (0.6 FTE) + 3 New RMs + 1 AD = 21 total
- Total field FTE: 18.0 (15 × 1.0 + 5 × 0.6, excluding AD carve-out)
- Schools per FTE: 140.3 (2,525 distributable schools ÷ 18.0 FTE)
- Sr. RM factor: 0.6 FTE (reduced caseload by design)
Targets & Tolerances
- Aspirational targets (for presentations/goals):
- Calculated FTE-proportional targets (for equity calculations):
- Balance goal: 80%+ RMs within tolerance (achieved: 16/20 = 80%)
Distance Constraints
- Maximum distance: 125 miles (hard constraint for all assignments)
- Preferred distance: 75 miles (encouraged but not enforced)
- Distance calculation radius: 125 miles (for API efficiency and contiguity)
- 4-in-a-day radius: 15 miles (for visit batching optimization)
County Subdivision
- Counties requiring subdivision: Top 4 only (DADE, BROWARD, ORANGE, HILLSBOROUGH)
- Sparse ZIP threshold: <10 schools (for ZIP clustering in subdivided counties)
- All other counties: Assigned whole (63 counties as atomic units)
Operational
- Fiscal year: July-June (annual rebalancing each July)
- Carve-out: 92 schools (Hialeah + Hialeah Gardens → AD Carla Hernandez)
Zip-Level Assignment Logic (Subdivided Counties Only)
IMPORTANT: Zip-level assignment is ONLY used for the 4 subdivided counties (DADE, BROWARD, ORANGE, HILLSBOROUGH). The other 62 counties are assigned whole as atomic units.
For the 4 subdivided counties:
1. Identify unique 5-digit zip codes within the county 2. Count schools per zip 3. Build adjacency using Census ZCTA (ZIP Code Tabulation Area) polygons: - Use TIGER/Line ZCTAs (e.g., tl2023us_zcta520) - Clip to Florida and filter to the specific county - Build adjacency graph (Queen adjacency: polygons sharing boundary/point) 4. Assign ZIPs contiguously to RMs: - Start with RM home ZIP as seed - Grow territory by adding adjacent ZIPs - Maintain strict contiguity throughout - Balance using FTE-proportional targets 5. Result: Each subdivided county split among 2-3 RMs with contiguous ZIP territories 6. Assignment: Each ZIP (and all schools within it) assigned to exactly one RM
Note: The final implementation does NOT create a separate zipcells.csv file. ZIP assignments are embedded directly in finalrm_assignments.json.
Dynamic Change Management
Ongoing School Changes (Between Annual Redistributions)
New Schools (~30/year):- Automatically assigned to RM covering that zip code
- New school additions can create imbalances - this is acceptable
- Annual rebalancing will correct accumulated imbalances
- All RMs responsible for onboarding new schools (essential job skill)
- School removed from RM count when is_active set to false
- May create "stranded zips" that fall below 5 schools
- Annual rebalancing will handle zip cell restructuring
- Full redistribution run annually (each July at fiscal year start)
- Accounts for accumulated new schools, closures, and any RM staff changes
- Uses same algorithm as initial redistribution
- If 1-2 new RMs are hired, use local rebalancing only
- New RM takes load from neighboring RMs within their geographic area
- Avoid full statewide redistribution to minimize disruption
- Add new RMs to rms.csv with home location and role type
- Calculate distances from new RM(s) to schools within 100-mile radius
- Run light redistribution affecting only zip cells within reasonable distance of new RM
Operational Context
Visit Requirements
- 52 school visits per quarter per RM (non-negotiable, set by leadership)
- Repeat visits to same school count toward 52
- Seasonal patterns exist (schools closed June-July, holiday breaks) but RMs must manage
- "4-in-a-day" batching encouraged where possible (schools within 15 miles of each other)
Relationship & History Considerations
- Relationship history does NOT override geography - too messy to incorporate into logic
- Existing RM-school relationships preserved where possible, but equity and distance take priority
- current_rm field used for churn analysis and rollout planning only
Data Quality & Updates
- School information updated by Florida DOE
- Updates announced but specific changes not detailed
- Address/zip changes handled at annual review with whatever is current on file
- Geocoding may be needed for schools with missing or invalid lat/lon
Common Data Issues
1. Missing Enrollment Values: Some schools have #N/A in enrollment field 2. Geocoding: All schools need valid lat/lon; some may require geocoding from address 3. Zip Cell Construction: Must merge sparse zips (<5 schools) into adjacent zips for contiguity 4. Current Assignments: Reflect 6-month-old overload situation in Orlando corridor 5. Distance Matrix: Road miles preferred over straight-line distance 6. DOE Data Lag: School moves/changes may not be immediately reflected 7. County/ZIP Mismatches: Known data quality errors in schools.csv: - CAREER PREP ACADEMY OF SOUTH FLORIDA106: ZIP 32246 (Jacksonville) but county = DADE (should be DUVAL) - CAREER PREP ACADEMY OF NORTHSIDE: ZIP 33157 (Miami) but county = DUVAL (should be DADE) - These must be corrected in source data before redistribution
Working with This Data
Data Validation Checklist (from Data Intake Checklist)
- All schools have DOE_code, address, zip, county
- All schools have lat/lon (geocode missing)
- Zips match city/county (fix obvious mismatches)
- Enrollment present; size_tier computed
- current_rm present (for churn reporting)
- RM homelat/homelon present
- No duplicate schools by DOE_code or exact address
- No coords outside Florida bounds
- Every ZIP code assigned to exactly one RM (clean territorial boundaries)
- Distances computed for all RM-school pairs within 125 miles; sanity-check outliers (>200 miles)
- Count totals: schools.csv total = 2,617; Baseline RM count = 18 (17 RMs + 1 AD)
- Carve-out tagged: Hialeah + Hialeah Gardens schools (92 schools → Carla Hernandez AD)
- Baseline redistribution: 2,525 schools across 17 RMs (15.0 FTE)
Expected Deliverables
Core Data Files:- schools.csv - 2,617 schools with final RM assignments (404KB)
- rms.csv - 21 RMs with home locations and FTE factors (1.7KB)
- county_adjacency.csv - County adjacency graph for contiguity validation (2.7KB)
- zctaadjacencycensus.csv - ZIP adjacency graph from Census data (18KB)
- county_data.csv - County centroids for distance calculations (3.3KB)
- distances.csv - RM-to-school road miles within 125-mile radius (1.4MB)
- floridazipadjacency.json - ZIP adjacency data for validation (48KB)
- finalrmassignments.json - Detailed per-RM metrics: schools, counties, ZIPs, targets, balance, diffs (20KB)
- finalrmsummary.csv - Summary table of all RM assignments (910B)
- assignment_changes.csv - Change log vs. current assignments for churn analysis (90KB)
- INDEX.html - Master dashboard with project overview, key metrics, and links (10KB)
- rmassignmentstable.html - Interactive sortable table of all RM assignments (6KB)
- ziplevelterritory_map.html - Interactive Folium map showing all territories with color coding (1.6MB)
- embedded_data.js - Embedded JSON data for interactive table (19KB)
- README.md - Project overview and final results summary
- CLAUDE.md - Complete technical specification (this document)
- SCHOOLSTATUSMANAGEMENT.md - Procedures for handling school closures and annual rebalancing
Note: The implementation does NOT create separate zipcells.csv file, exception lists with batching plans, or individual RM one-pagers. ZIP assignments are embedded in finalrm_assignments.json, and visualizations are interactive HTML rather than static documents.
API Keys & External Services
IMPORTANT: API keys are stored in.env file (not committed to repo)
- Google Maps API key:
GOOGLE_MAPS_API_KEY(reference only, using MapBox instead) - MapBox access token:
MAPBOX_ACCESS_TOKEN(used for Directions API)
- Free tier: 100,000 requests/month
- Use for calculating road miles between RM home locations and schools
- Calculate distances only within 125-mile straight-line radius to optimize API usage
- Road miles preferred over straight-line distance
- Use TIGER/Line ZCTAs for Florida zip code boundaries
- Dataset: tl2023us_zcta520 (or latest year available)
- Needed ONLY for building ZIP adjacency in the 4 subdivided counties (DADE, BROWARD, ORANGE, HILLSBOROUGH)
- Not required for the 62 counties assigned whole
Documentation Structure
This repository uses interconnected Markdown documentation:
- Region Redistribution Project - Data Considerations.md: Data schema and requirements
- Region Redistribution Project - Redistribution Modeling Logic.md: Complete algorithm specification with ELI5 section
- Region Redistribution Project - Data Intake Checklist.md: Data validation requirements and deliverables
- Region Redistribution Project - Simulation Runs Log.md: Template for tracking simulation runs with parameters and outcomes