metrics module
Prometheus metrics primitives and helpers.
- metrics.note_active_user(user_id)[מקור]
Record that a specific user was active recently, and update the gauge.
This uses a simple in-memory set per-process. It is a best-effort indicator and does not attempt cross-process aggregation. Good enough for basic dashboards/tests.
- metrics.note_request_started()[מקור]
Increment the in-flight requests gauge (best-effort).
- Return type:
- metrics.note_request_finished()[מקור]
Decrement the in-flight requests gauge (never negative).
- Return type:
- metrics.get_active_requests_count()[מקור]
Return the current in-flight request count (best-effort).
- Return type:
- metrics.get_current_memory_usage()[מקור]
Return current process RSS in MB (best-effort).
- Return type:
- metrics.get_recent_errors_count(minutes=5)[מקור]
Return the number of 5xx errors recorded in the last X minutes.
- metrics.get_top_slow_endpoints(limit=5, window_seconds=None)[מקור]
Return the slowest endpoints observed recently (best-effort).
- metrics.get_slowest_endpoint()[מקור]
Return a formatted string describing the slowest endpoint recently seen.
- Return type:
- metrics.note_deployment_started(summary='Service starting up')[מקור]
Mark the start of a deployment and emit an informational alert.
- metrics.note_deployment_shutdown(summary='Service shutting down')[מקור]
Emit a shutdown deployment event (does not reset latency grace period).
- metrics.get_avg_response_time_seconds()[מקור]
Return the smoothed average HTTP response time (seconds).
- Return type:
- metrics.record_request_outcome(status_code, duration_seconds, *, source=None, handler=None, command=None, cache_hit=None, status_label=None, method=None, path=None)[מקור]
Record a single HTTP request outcome across services.
Increments total requests
Increments failed requests (status >= 500)
Updates EWMA average response time gauge
Performs lightweight anomaly detection and emits internal alerts when thresholds are exceeded
- metrics.update_health_gauges(*, mongo_connected=None, ping_ms=None, indexes_total=None, latency_ewma_ms=None)[מקור]
Best-effort bridge from /healthz payload into Prometheus gauges.
- metrics.record_startup_stage_metric(stage, duration_ms)[מקור]
Expose per-stage startup duration (milliseconds) via Prometheus.
- metrics.record_startup_total_metric(duration_ms)[מקור]
Expose total startup duration (milliseconds) via Prometheus.
- metrics.record_http_request(method, endpoint, status_code, duration_seconds, *, path=None)[מקור]
Record HTTP request metrics for SLO calculations.
Increments
http_requests_total{method,endpoint,status}Observes
http_request_duration_seconds{method,endpoint}
This function is best-effort and never raises.
- metrics.record_request_queue_delay(method, endpoint, delay_seconds)[מקור]
Record request queue delay (best-effort, never raises).
- metrics.get_boot_monotonic()[מקור]
Return the process boot monotonic timestamp captured by metrics import.
- Return type:
- metrics.mark_startup_complete()[מקור]
Mark startup as complete and set app_startup_seconds/startup_completed gauges.
Safe no-op if metrics are unavailable.
- Return type:
- metrics.note_first_request_latency(duration_seconds=None)[מקור]
Record the latency from process boot to first completed HTTP request.
If duration_seconds is None, compute against get_boot_monotonic().
- metrics.get_process_uptime_seconds()[מקור]
Return approximate process uptime in seconds using perf_counter baseline.
This is computed as
perf_counter() - get_boot_monotonic()to yield elapsed time since the baseline captured at import. It is best-effort and monotonic.- Return type:
- metrics.record_dependency_init(dependency, duration_seconds)[מקור]
Observe initialization time for a named dependency (Histogram).
- metrics.record_db_operation(operation, duration_seconds, *, status='ok')[מקור]
Record latency + count for database hot path operations.
- metrics.get_uptime_percentage()[מקור]
Compute uptime percentage based on request counters.
Uptime ≈ 1 - (failed / total). If counters are unavailable or total==0, return 100.0.
- Return type:
- metrics.set_adaptive_observability_gauges(*, error_rate_threshold_percent=None, latency_threshold_seconds=None, current_error_rate_percent=None, current_latency_avg_seconds=None)[מקור]
Update adaptive observability gauges. No-ops if gauges unavailable.
- metrics.set_external_error_rate_percent(value)[מקור]
Update the external error rate gauge (best-effort).
- metrics.track_file_saved(user_id, language, size_bytes)[מקור]
Record a file_saved business event.
Uses structured log for rich context and a lightweight Prometheus counter for volume.