# ROM Library Organization **Status**: In Progress **Started**: 2026-03-09 **Location**: R:\ (38.8 TB) ## Goal Phase 1: Inventory all ROMs across multiple gaming systems Phase 2: Detect duplicates via MD5 hashing Phase 3: Identify missing ROMs from No-Intro/Redump sets (future) ## Library Structure ``` R:/ ├── Rom Sets (Organized)/ │ ├── Nintendo/ │ ├── Sony/ │ ├── Sega/ │ ├── Microsoft/ │ ├── Atari/ │ ├── Arcade/ │ ├── Computers/ │ └── Misc Consoles/ └── Rom Sets (Somewhat Organized)/ ``` ## Quick Scan Results (2026-03-09) - **Total**: 98,601 items, 1,701 GB - **Top by count**: Commodore 64 (24,349), Atari (10,935), MAME (8,651) - **Top by size**: PSN ISO Pack (672 GB), Nintendo 3DS (412 GB), TurboGrafx-CD (234 GB) ## By Manufacturer | Manufacturer | Items | Size | |--------------|-------|------| | Computers | 47,327 | 61.89 GB | | Arcade | 12,951 | 32.97 GB | | Atari | 12,399 | 2.56 GB | | Nintendo | 12,017 | 467.24 GB | | Sony | 3,106 | 672.40 GB | | Sega | 2,747 | 3.54 GB | | Microsoft | 1,661 | 0.05 GB | ## Disc vs Cartridge Systems - **Disc systems** (count folders): PSX (1,516), PS3 (77), PS VITA (6), Saturn (3) - **Cartridge systems** (count files): NES (1,592), SNES, Genesis, GBA, etc. ## Scripts - `tools/rom-quick-scan.py` - Quick count (completed) - `tools/rom-full-scan.py` - Duplicate detection (overnight scan) ## Output Files - `rom-inventory/rom-inventory.json` - Quick scan - `rom-inventory/rom-full-*.json` - Full scan with duplicates ## Notes - Hash only files under 50MB (speed vs coverage tradeoff) - Node gateway has 30s timeout - use background processes for long scans - No-Intro DAT files available at https://datomatic.no-intro.org/ \n## Full Scan Results (2026-04-09)\n\n**Status:** Complete\n\n| Metric | Value |\n|--------|-------|\n| Total files | 773,442 |\n| Total size | 21.9 TB |\n| Files hashed | 756,454 |\n| Skipped (too large) | 16,987 |\n| **Duplicates found** | **44,844** |\n\n**Runtime:** 13 hours\n\n**Output:** `rom-inventory/rom-full-scan.json`\n\n**Next steps:**\n1. Analyze JSON to identify duplicate clusters\n2. Determine which systems have most duplicates\n3. Create cleanup plan (manual review vs auto-delete)\n