# Copilot Instructions ## Project Overview - Repo covers a lightweight unsupervised anomaly detection pipeline focused on historian CSV exports. - Core package lives in `src/`; CLI entry points `train.py` and `detect.py` orchestrate data prep, feature engineering, and model scoring. - `requirements.txt` pins analytics stack (`pandas`, `scikit-learn`, `numpy`, `joblib`)—assume Python 3.10+ with virtualenv per README. ## Data Loading & Validation - Reuse `src/data_loader.py::load_timeseries` with `DataLoadConfig` to ensure consistent timestamp parsing, optional timezone localization, and feature inference. - When adding new ingestion logic, funnel it through `load_timeseries` or extend it; downstream code relies on `df.attrs["feature_columns"]` being populated for inference overrides. - Raise `DataValidationError` for user-facing data issues instead of generic exceptions so CLIs can surface clear messages. ## Feature Engineering Patterns - `feature_engineering.build_feature_matrix` is the single entry point for derived features; it controls rolling stats (`add_rolling_statistics`) and rate-of-change (`add_rate_of_change`). - Rolling windows are expressed with pandas offset aliases (default `5T`, `15T`, `60T`); keep new feature names suffix-based so persisted artifacts stay discoverable. - Always pass through `timestamp_column` and any `id_columns`; the helper filters non-numeric fields automatically. ## Training Workflow (`src/train.py`) - CLI expects PowerShell-friendly invocation (`^` line continuations) and creates artifact bundles with pipeline + metadata. - `fit_pipeline` wraps `StandardScaler` + `IsolationForest` with configurable contamination, estimators, and random-state—extend via the existing Pipeline to avoid breaking saved artifacts. - `generate_scores` writes anomaly flags plus ranking; extra columns must come from the non-feature portion of `feature_df`. - Outputs default to `ml/anomaly_detection/models/` and `ml/anomaly_detection/outputs/`; use `ensure_parent_dir` before writing new files. ## Detection Workflow (`src/detect.py`) - Loads the joblib artifact and rehydrates config (rolling flags, windows) when building features; keep artifact schema stable across changes. - Supports overrides for timestamp, features, and id columns—mirror option names if adding parameters to maintain parity with training CLI. - `--keep-features` toggles whether engineered columns are retained in the scored CSV; preserve this pattern when expanding outputs. - If you add new anomaly criteria, integrate with the existing `alert_threshold` / `top_n` flow instead of inventing parallel mechanisms. ## Project Conventions & Tips - Scripts use local imports (e.g., `from data_loader import ...`); when creating new modules keep them under `src/` and import similarly to run via `python ml/anomaly_detection/src/