# ROM Library Organization

**Status**: In Progress
**Started**: 2026-03-09
**Location**: R:\ (38.8 TB)

## Goal
Phase 1: Inventory all ROMs across multiple gaming systems
Phase 2: Detect duplicates via MD5 hashing
Phase 3: Identify missing ROMs from No-Intro/Redump sets (future)

## Library Structure
```
R:/
├── Rom Sets (Organized)/
│   ├── Nintendo/
│   ├── Sony/
│   ├── Sega/
│   ├── Microsoft/
│   ├── Atari/
│   ├── Arcade/
│   ├── Computers/
│   └── Misc Consoles/
└── Rom Sets (Somewhat Organized)/
```

## Quick Scan Results (2026-03-09)
- **Total**: 98,601 items, 1,701 GB
- **Top by count**: Commodore 64 (24,349), Atari (10,935), MAME (8,651)
- **Top by size**: PSN ISO Pack (672 GB), Nintendo 3DS (412 GB), TurboGrafx-CD (234 GB)

## By Manufacturer
| Manufacturer | Items | Size |
|--------------|-------|------|
| Computers | 47,327 | 61.89 GB |
| Arcade | 12,951 | 32.97 GB |
| Atari | 12,399 | 2.56 GB |
| Nintendo | 12,017 | 467.24 GB |
| Sony | 3,106 | 672.40 GB |
| Sega | 2,747 | 3.54 GB |
| Microsoft | 1,661 | 0.05 GB |

## Disc vs Cartridge Systems
- **Disc systems** (count folders): PSX (1,516), PS3 (77), PS VITA (6), Saturn (3)
- **Cartridge systems** (count files): NES (1,592), SNES, Genesis, GBA, etc.

## Scripts
- `tools/rom-quick-scan.py` - Quick count (completed)
- `tools/rom-full-scan.py` - Duplicate detection (overnight scan)

## Output Files
- `rom-inventory/rom-inventory.json` - Quick scan
- `rom-inventory/rom-full-*.json` - Full scan with duplicates

## Notes
- Hash only files under 50MB (speed vs coverage tradeoff)
- Node gateway has 30s timeout - use background processes for long scans
- No-Intro DAT files available at https://datomatic.no-intro.org/
\n## Full Scan Results (2026-04-09)\n\n**Status:** Complete\n\n| Metric | Value |\n|--------|-------|\n| Total files | 773,442 |\n| Total size | 21.9 TB |\n| Files hashed | 756,454 |\n| Skipped (too large) | 16,987 |\n| **Duplicates found** | **44,844** |\n\n**Runtime:** 13 hours\n\n**Output:** `rom-inventory/rom-full-scan.json`\n\n**Next steps:**\n1. Analyze JSON to identify duplicate clusters\n2. Determine which systems have most duplicates\n3. Create cleanup plan (manual review vs auto-delete)\n