metrics module

Prometheus metrics primitives and helpers.

metrics.emit_event(event, severity='info', **fields)[מקור]

פרמטרים:

event (str)
severity (str)

metrics.generate_latest()[מקור]

metrics.track_performance(operation, labels=None)[מקור]

פרמטרים:

operation (str)
labels (Dict[str, str] | None)

metrics.metrics_endpoint_bytes()[מקור]

Return type:: bytes

metrics.metrics_content_type()[מקור]

Return type:: str

metrics.note_active_user(user_id)[מקור]

Record that a specific user was active recently, and update the gauge.

This uses a simple in-memory set per-process. It is a best-effort indicator and does not attempt cross-process aggregation. Good enough for basic dashboards/tests.

Return type:: None
פרמטרים:: user_id (int)

metrics.note_request_started()[מקור]

Increment the in-flight requests gauge (best-effort).

Return type:: None

metrics.note_request_finished()[מקור]

Decrement the in-flight requests gauge (never negative).

Return type:: None

metrics.get_active_requests_count()[מקור]

Return the current in-flight request count (best-effort).

Return type:: int

metrics.get_current_memory_usage()[מקור]

Return current process RSS in MB (best-effort).

Return type:: float

metrics.get_recent_errors_count(minutes=5)[מקור]

Return the number of 5xx errors recorded in the last X minutes.

Return type:: int
פרמטרים:: minutes (int)

metrics.get_top_slow_endpoints(limit=5, window_seconds=None)[מקור]

Return the slowest endpoints observed recently (best-effort).

Return type:

List[Dict[str, Any]]

פרמטרים:

limit (int)
window_seconds (int | None)

metrics.get_slowest_endpoint()[מקור]

Return a formatted string describing the slowest endpoint recently seen.

Return type:: str

metrics.note_deployment_started(summary='Service starting up')[מקור]

Mark the start of a deployment and emit an informational alert.

Return type:: None
פרמטרים:: summary (str)

metrics.note_deployment_shutdown(summary='Service shutting down')[מקור]

Emit a shutdown deployment event (does not reset latency grace period).

Return type:: None
פרמטרים:: summary (str)

metrics.get_avg_response_time_seconds()[מקור]

Return the smoothed average HTTP response time (seconds).

Return type:: float

metrics.record_request_outcome(status_code, duration_seconds, *, source=None, handler=None, command=None, cache_hit=None, status_label=None, method=None, path=None)[מקור]

Record a single HTTP request outcome across services.

Increments total requests
Increments failed requests (status >= 500)
Updates EWMA average response time gauge
Performs lightweight anomaly detection and emits internal alerts when thresholds are exceeded

Return type:

פרמטרים:

status_code (int)
duration_seconds (float)
source (str | None)
handler (str | None)
command (str | None)
cache_hit (bool | str | None)
status_label (str | None)
method (str | None)
path (str | None)

metrics.update_health_gauges(*, mongo_connected=None, ping_ms=None, indexes_total=None, latency_ewma_ms=None)[מקור]

Best-effort bridge from /healthz payload into Prometheus gauges.

Return type:

פרמטרים:

mongo_connected (bool | None)
ping_ms (float | None)
indexes_total (float | None)
latency_ewma_ms (float | None)

metrics.record_startup_stage_metric(stage, duration_ms)[מקור]

Expose per-stage startup duration (milliseconds) via Prometheus.

Return type:

פרמטרים:

stage (str)
duration_ms (float | None)

metrics.record_startup_total_metric(duration_ms)[מקור]

Expose total startup duration (milliseconds) via Prometheus.

Return type:: None
פרמטרים:: duration_ms (float | None)

metrics.record_http_request(method, endpoint, status_code, duration_seconds, *, path=None)[מקור]

Record HTTP request metrics for SLO calculations.

Increments http_requests_total{method,endpoint,status}
Observes http_request_duration_seconds{method,endpoint}

This function is best-effort and never raises.

Return type:

פרמטרים:

method (str)
endpoint (str | None)
status_code (int)
duration_seconds (float)
path (str | None)

metrics.record_request_queue_delay(method, endpoint, delay_seconds)[מקור]

Record request queue delay (best-effort, never raises).

Return type:

פרמטרים:

method (str)
endpoint (str | None)
delay_seconds (float)

metrics.record_outbound_request_duration(service, endpoint, status, duration_seconds)[מקור]

Return type:

פרמטרים:

service (str | None)
endpoint (str | None)
status (str | None)
duration_seconds (float)

metrics.increment_outbound_retry(service, endpoint)[מקור]

Return type:

פרמטרים:

service (str | None)
endpoint (str | None)

metrics.set_circuit_state(service, endpoint, state_value)[מקור]

Return type:

פרמטרים:

service (str | None)
endpoint (str | None)
state_value (float)

metrics.set_circuit_success_rate(service, endpoint, value)[מקור]

Return type:

פרמטרים:

service (str | None)
endpoint (str | None)
value (float)

metrics.get_boot_monotonic()[מקור]

Return the process boot monotonic timestamp captured by metrics import.

Return type:: float

metrics.mark_startup_complete()[מקור]

Mark startup as complete and set app_startup_seconds/startup_completed gauges.

Safe no-op if metrics are unavailable.

Return type:: None

metrics.note_first_request_latency(duration_seconds=None)[מקור]

Record the latency from process boot to first completed HTTP request.

If duration_seconds is None, compute against get_boot_monotonic().

Return type:: None
פרמטרים:: duration_seconds (float | None)

metrics.get_process_uptime_seconds()[מקור]

Return approximate process uptime in seconds using perf_counter baseline.

This is computed as perf_counter() - get_boot_monotonic() to yield elapsed time since the baseline captured at import. It is best-effort and monotonic.

Return type:: float

metrics.record_dependency_init(dependency, duration_seconds)[מקור]

Observe initialization time for a named dependency (Histogram).

Return type:

פרמטרים:

dependency (str)
duration_seconds (float)

metrics.record_db_operation(operation, duration_seconds, *, status='ok')[מקור]

Record latency + count for database hot path operations.

Return type:

פרמטרים:

operation (str)
duration_seconds (float)
status (str | None)

metrics.get_uptime_percentage()[מקור]

Compute uptime percentage based on request counters.

Uptime ≈ 1 - (failed / total). If counters are unavailable or total==0, return 100.0.

Return type:: float

metrics.set_adaptive_observability_gauges(*, error_rate_threshold_percent=None, latency_threshold_seconds=None, current_error_rate_percent=None, current_latency_avg_seconds=None)[מקור]

Update adaptive observability gauges. No-ops if gauges unavailable.

Return type:

פרמטרים:

error_rate_threshold_percent (float | None)
latency_threshold_seconds (float | None)
current_error_rate_percent (float | None)
current_latency_avg_seconds (float | None)

metrics.set_external_error_rate_percent(value)[מקור]

Update the external error rate gauge (best-effort).

Return type:: None
פרמטרים:: value (float | None)

metrics.track_file_saved(user_id, language, size_bytes)[מקור]

Record a file_saved business event.

Uses structured log for rich context and a lightweight Prometheus counter for volume.

Return type:

פרמטרים:

user_id (int)
language (str)
size_bytes (int)

metrics.track_search_performed(user_id, query, results_count)[מקור]

Record a search event without logging raw query (privacy by default).

Return type:

פרמטרים:

user_id (int)
query (str)
results_count (int)

metrics.track_github_sync(user_id, files_count, success)[מקור]

Record a github_sync event (aggregate outcome only).

Return type:

פרמטרים:

user_id (int)
files_count (int)
success (bool)