remediation_manager module

Auto-remediation manager for critical incidents.

  • Persists incidents to data/incidents_log.json (append-only JSON lines)

  • Triggers remediation actions based on incident type

  • Adds Grafana annotations for visibility (best-effort)

  • Implements simple 15-minute recurrence detection and adaptive threshold bump hook

Environment variables (optional): - GRAFANA_URL, GRAFANA_API_TOKEN for annotations

Notes: - File I/O is constrained under data/ per workspace safety rules - All operations are best-effort; never raise from public APIs

remediation_manager.handle_critical_incident(name, metric, value, threshold, details=None)[מקור]

Main entrypoint: log incident, attempt remediation, annotate Grafana, and return incident_id.

name examples: ”High Error Rate“, ”High Latency“, ”DB Connection Errors“ metric examples: ”error_rate_percent“, ”latency_seconds“, ”db_connection_errors“

Return type:

str

פרמטרים:
remediation_manager.get_incidents(limit=50)[מקור]
Return type:

List[Dict[str, Any]]

פרמטרים:

limit (int)