Files
controls-web/ai_agents/anomaly_detection/.github/copilot-instructions.md
2026-02-17 09:29:34 -06:00

3.3 KiB

Copilot Instructions

Project Overview

  • Repo covers a lightweight unsupervised anomaly detection pipeline focused on historian CSV exports.
  • Core package lives in src/; CLI entry points train.py and detect.py orchestrate data prep, feature engineering, and model scoring.
  • requirements.txt pins analytics stack (pandas, scikit-learn, numpy, joblib)—assume Python 3.10+ with virtualenv per README.

Data Loading & Validation

  • Reuse src/data_loader.py::load_timeseries with DataLoadConfig to ensure consistent timestamp parsing, optional timezone localization, and feature inference.
  • When adding new ingestion logic, funnel it through load_timeseries or extend it; downstream code relies on df.attrs["feature_columns"] being populated for inference overrides.
  • Raise DataValidationError for user-facing data issues instead of generic exceptions so CLIs can surface clear messages.

Feature Engineering Patterns

  • feature_engineering.build_feature_matrix is the single entry point for derived features; it controls rolling stats (add_rolling_statistics) and rate-of-change (add_rate_of_change).
  • Rolling windows are expressed with pandas offset aliases (default 5T, 15T, 60T); keep new feature names suffix-based so persisted artifacts stay discoverable.
  • Always pass through timestamp_column and any id_columns; the helper filters non-numeric fields automatically.

Training Workflow (src/train.py)

  • CLI expects PowerShell-friendly invocation (^ line continuations) and creates artifact bundles with pipeline + metadata.
  • fit_pipeline wraps StandardScaler + IsolationForest with configurable contamination, estimators, and random-state—extend via the existing Pipeline to avoid breaking saved artifacts.
  • generate_scores writes anomaly flags plus ranking; extra columns must come from the non-feature portion of feature_df.
  • Outputs default to ml/anomaly_detection/models/ and ml/anomaly_detection/outputs/; use ensure_parent_dir before writing new files.

Detection Workflow (src/detect.py)

  • Loads the joblib artifact and rehydrates config (rolling flags, windows) when building features; keep artifact schema stable across changes.
  • Supports overrides for timestamp, features, and id columns—mirror option names if adding parameters to maintain parity with training CLI.
  • --keep-features toggles whether engineered columns are retained in the scored CSV; preserve this pattern when expanding outputs.
  • If you add new anomaly criteria, integrate with the existing alert_threshold / top_n flow instead of inventing parallel mechanisms.

Project Conventions & Tips

  • Scripts use local imports (e.g., from data_loader import ...); when creating new modules keep them under src/ and import similarly to run via python ml/anomaly_detection/src/<script>.py.
  • Favor pandas-native operations and avoid mutating input frames in place—helpers copy data before augmentation.
  • Gracefully handle timezone-aware timestamps by checking dtype (pandas.api.types) as done in feature helpers.
  • There are no bundled tests; when adding features, demonstrate usage via docstrings or README snippets and validate with small CSV fixtures.
  • Readme commands assume Windows PowerShell; prefer caret continuations and backslash paths when documenting new CLI usage.