From 03c11fd54a6f15a7a1da369891f54e22d540e4f1 Mon Sep 17 00:00:00 2001 From: Michele Cereda Date: Tue, 4 Jun 2024 00:24:41 +0200 Subject: [PATCH] feat(kb/prometheus): filter out metrics --- .vscode/settings.json | 3 ++ knowledge base/prometheus.md | 97 ++++++++++++++++++++++++++++-------- 2 files changed, 78 insertions(+), 22 deletions(-) diff --git a/.vscode/settings.json b/.vscode/settings.json index 0a4a5e1..f65bec0 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -65,6 +65,7 @@ "adduser", "airgap", "airgapped", + "alertmanager", "apiserver", "asciicast", "asciinema", @@ -214,6 +215,7 @@ "openmediavault", "openpgp", "opentofu", + "openwrt", "opkg", "ossec", "pacman", @@ -264,6 +266,7 @@ "setfattr", "siem", "slurm", + "snmp", "spiffe", "sshfs", "sshpass", diff --git a/knowledge base/prometheus.md b/knowledge base/prometheus.md index 11f02ad..616df90 100644 --- a/knowledge base/prometheus.md +++ b/knowledge base/prometheus.md @@ -1,7 +1,9 @@ # Prometheus -Monitoring and alerting system that collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.
-Metrics can also be pushed using plugins, in the event hosts are behind a firewall or prohibited from opening ports by security policy. +Monitoring and alerting system that collects metrics from configured targets at given intervals, evaluates rule +expressions, displays the results, and can trigger alerts when specified conditions are observed.
+Metrics can also be pushed using plugins, in the event hosts are behind a firewall or prohibited from opening ports by +security policy. ## Table of contents @@ -9,6 +11,7 @@ Metrics can also be pushed using plugins, in the event hosts are behind a firewa 1. [Extras](#extras) 1. [Configuration](#configuration) 1. [Queries](#queries) +1. [Filter metrics](#filter-metrics) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -17,14 +20,18 @@ Metrics can also be pushed using plugins, in the event hosts are behind a firewa Prometheus is composed by its **server**, the **Alertmanager** and its **exporters**. Alerting rules can be created within Prometheus, and configured to send custom alerts to _Alertmanager_.
-Alertmanager then processes and handles the alerts, including sending notifications through different mechanisms or third-party services. +Alertmanager then processes and handles the alerts, including sending notifications through different mechanisms or +third-party services. -The _exporters_ can be libraries, processes, devices, or anything else exposing metrics so that they can be scraped by Prometheus.
-Such metrics are usually made available at the `/metrics` endpoint, which allows them to be scraped directly from Prometheus without the need of an agent. +The _exporters_ can be libraries, processes, devices, or anything else exposing metrics so that they can be scraped by +Prometheus.
+Such metrics are usually made available at the `/metrics` endpoint, which allows them to be scraped directly from +Prometheus without the need of an agent. ### Extras -As welcomed addition, [Grafana] can be configured to use Prometheus as a backend of its in order to provide data visualization and dashboarding functions on the data it provides. +As welcomed addition, [Grafana] can be configured to use Prometheus as a backend of its in order to provide data +visualization and dashboarding functions on the data it provides. ## Configuration @@ -45,14 +52,20 @@ scrape_configs: - job_name: router static_configs: - targets: [ 'openwrt.local:9100' ] + metric_relabel_configs: + - source_labels: [__name__] + action: keep + regex: '(node_cpu)' ``` ## Queries Prometheus' query syntax is [PromQL]. -All data is stored as time series, each one identified by a metric name, e.g. `node_filesystem_avail_bytes` for available filesystem space.
-Metrics' names can be used in the expressions to select all of the time series with this name and produce an **instant vector**. +All data is stored as time series, each one identified by a metric name, e.g. `node_filesystem_avail_bytes` for +available filesystem space.
+Metrics' names can be used in the expressions to select all of the time series with this name and produce an +**instant vector**. Time series can be filtered using selectors and labels (sets of key-value pairs): @@ -77,8 +90,40 @@ When using time ranges, the vector returned will be a **range vector**. ![advanced query](prometheus%20advanced%20query.png) -Labels are used to filter the job and the mode. `node_cpu_seconds_total` returns a **counter**, and the irate() function calculates the **per-second rate of change** based on the last two data points of the range interval.
-To calculate the overall CPU usage, the idle mode of the metric is used. Since idle percent of a processor is the opposite of a busy processor, the irate value is subtracted from 1. To make it a percentage, it is multiplied by 100. +Labels are used to filter the job and the mode. `node_cpu_seconds_total` returns a **counter**, and the irate() function +calculates the **per-second rate of change** based on the last two data points of the range interval.
+To calculate the overall CPU usage, the idle mode of the metric is used. Since idle percent of a processor is the +opposite of a busy processor, the irate value is subtracted from 1. To make it a percentage, it is multiplied by 100. + +## Filter metrics + +Refer [How relabeling in Prometheus works], [Scrape selective metrics in Prometheus] and +[Dropping metrics at scrape time with Prometheus]. + +Use [metric relabeling configurations][metric_relabel_configs] to select which series to ingest **after** scraping: + +```diff + scrape_configs: + - job_name: router + … ++ metric_relabel_configs: ++ - # do *not* record metrics which name matches the regex ++ # in this case, those which name starts with 'node_disk_' ++ source_labels: [ __name__ ] ++ action: drop ++ regex: node_disk_.* + - job_name: hosts + … ++ metric_relabel_configs: ++ - # *only* record metrics which name matches the regex ++ # in this case, those which name starts with 'node_cpu_' with cpu=1 and mode=user ++ source_labels: ++ - __name__ ++ - cpu ++ - mode ++ regex: node_cpu_.*1.*user.* ++ action: keep +``` ## Further readings @@ -86,7 +131,7 @@ To calculate the overall CPU usage, the idle mode of the metric is used. Since i - [Github] - [`docker/monitoring`][docker/monitoring] - [Node exporter] -- [SMNP exporter] +- [SNMP exporter] - [`ordaa/boinc_exporter`][ordaa/boinc_exporter] - [Grafana] @@ -100,20 +145,15 @@ All the references in the [further readings] section, plus the following: - [`prometheus/node_exporter`][prometheus/node_exporter] - [`prometheus/snmp_exporter`][prometheus/snmp_exporter] - [How I monitor my OpenWrt router with Grafana Cloud and Prometheus] +- [Scrape selective metrics in Prometheus] +- [Dropping metrics at scrape time with Prometheus] +- [How relabeling in Prometheus works] - -[functions]: https://prometheus.io/docs/prometheus/latest/querying/functions/ -[github]: https://github.com/prometheus/prometheus -[node exporter guide]: https://prometheus.io/docs/guides/node-exporter/ -[prometheus/node_exporter]: https://github.com/prometheus/node_exporter -[prometheus/snmp_exporter]: https://github.com/prometheus/snmp_exporter -[promql]: https://prometheus.io/docs/prometheus/latest/querying/basics/ -[website]: https://prometheus.io/ - [further readings]: #further-readings @@ -123,10 +163,23 @@ All the references in the [further readings] section, plus the following: [snmp exporter]: snmp%20exporter.md -[docker/monitoring]: ../docker/monitoring/README.md +[docker/monitoring]: ../containers/monitoring/README.md + + +[functions]: https://prometheus.io/docs/prometheus/latest/querying/functions/ +[github]: https://github.com/prometheus/prometheus +[node exporter guide]: https://prometheus.io/docs/guides/node-exporter/ +[prometheus/node_exporter]: https://github.com/prometheus/node_exporter +[prometheus/snmp_exporter]: https://github.com/prometheus/snmp_exporter +[promql]: https://prometheus.io/docs/prometheus/latest/querying/basics/ +[website]: https://prometheus.io/ +[metric_relabel_configs]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs +[dropping metrics at scrape time with prometheus]: https://www.robustperception.io/dropping-metrics-at-scrape-time-with-prometheus/ [getting started with prometheus]: https://opensource.com/article/18/12/introduction-prometheus [how i monitor my openwrt router with grafana cloud and prometheus]: https://grafana.com/blog/2021/02/09/how-i-monitor-my-openwrt-router-with-grafana-cloud-and-prometheus/ [ordaa/boinc_exporter]: https://gitlab.com/ordaa/boinc_exporter +[scrape selective metrics in prometheus]: https://docs.last9.io/docs/how-to-scrape-only-selective-metrics-in-prometheus [snmp monitoring and easing it with prometheus]: https://medium.com/@openmohan/snmp-monitoring-and-easing-it-with-prometheus-b157c0a42c0c +[how relabeling in prometheus works]: https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/