controls-web/ai_agents/anomaly_detection/README.md

# Anomaly Detection Starter Kit

This module seeds a machine-learning workflow for flagging unusual behavior in LASUCA's process data (steam flow, turbine RPM, conveyor load cells, etc.). It focuses on unsupervised anomaly detection so you can start surfacing outliers without labeled fault data.

## Project structure

```
ml/anomaly_detection/
├── README.md                # Project overview and next steps
├── requirements.txt         # Python dependencies for the pipeline
└── src/
    ├── __init__.py          # Marks the package
    ├── data_loader.py       # Helpers for reading & validating time-series data
    ├── feature_engineering.py # Domain feature transformations and rolling stats
    ├── train.py             # CLI script to fit an Isolation Forest model
    └── detect.py            # CLI script to score new data with the trained model
```

## Quick start

1. **Create a virtual environment** inside the repository root and install dependencies:
   ```powershell
   python -m venv .venv
   .\.venv\Scripts\Activate.ps1
   pip install -r ml/anomaly_detection/requirements.txt
   ```
2. **Prepare a CSV export** with at least the following columns:
   - `timestamp`: ISO 8601 timestamp or anything `pandas.to_datetime` can parse.
   - Sensor columns: numerical fields such as `steam_tph`, `turbine_rpm`, `conveyor_tph`.

   Additional metadata columns (e.g., `area`, `equipment`) are optional and help slice metrics later.

3. **Train a baseline model**:
   ```powershell
   python ml/anomaly_detection/src/train.py ^
       --data data/clean/process_snapshot.csv ^
       --timestamp-column timestamp ^
       --features steam_tph turbine_rpm conveyor_tph ^
       --model-out ml/anomaly_detection/models/isolation_forest.joblib
   ```
   The script standardizes numeric columns, fits an Isolation Forest, and saves the pipeline along with a CSV of anomaly scores.

4. **Score fresh data** (e.g., streaming batch or another day):
   ```powershell
   python ml/anomaly_detection/src/detect.py ^
       --data data/clean/process_snapshot_new.csv ^
       --model ml/anomaly_detection/models/isolation_forest.joblib ^
       --timestamp-column timestamp ^
       --features steam_tph turbine_rpm conveyor_tph ^
       --output data/clean/process_snapshot_new_scored.csv
   ```

## Roadmap ideas

| Phase | Goal | Details |
|-------|------|---------|
| Baseline | Clean data + isolation forest | Validate signals, calculate rolling mean/std, track top anomalies per asset & shift. |
| Enhancements | Context-aware detection | Separate models per unit (boiler, milling line), include load-based normalization, add feedback loop for dismissed alerts. |
| Advanced | Forecast + residual alerts | Train LSTM/Prophet forecasts and alert on residuals, integrate maintenance work orders. |

## Data tips

- **Resample** fast signals to a consistent cadence (e.g., 1 min) to smooth control jitter.
- **Align units** (e.g., convert all steam flows to TPH) before feeding models.
- **Label known events** (downs, maintenance) to benchmark the detector and reduce false positives.

## Next steps

1. Pull a week of reconciled historian data into `data/clean/`.
2. Run `train.py` to create an initial anomaly score CSV.
3. Visualize results in the existing dashboards or a Jupyter notebook (e.g., scatter of anomaly score vs. timestamp grouped by equipment).
4. Iterate on feature engineering: rolling gradients, energy-per-ton, turbine slip ratios, etc.
5. Deploy: schedule the detection script (cron/Windows Task Scheduler) and push alerts via email or dashboard badges.

Feel free to extend the pipeline with deep-learning models, model registry integration, or streaming inferencing as the project matures.