feat(kb/prometheus): filter out metrics

This commit is contained in:
Michele Cereda
2024-06-04 00:24:41 +02:00
parent 107aa0993d
commit 03c11fd54a
2 changed files with 78 additions and 22 deletions

View File

@@ -65,6 +65,7 @@
"adduser",
"airgap",
"airgapped",
"alertmanager",
"apiserver",
"asciicast",
"asciinema",
@@ -214,6 +215,7 @@
"openmediavault",
"openpgp",
"opentofu",
"openwrt",
"opkg",
"ossec",
"pacman",
@@ -264,6 +266,7 @@
"setfattr",
"siem",
"slurm",
"snmp",
"spiffe",
"sshfs",
"sshpass",

View File

@@ -1,7 +1,9 @@
# Prometheus
Monitoring and alerting system that collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.<br/>
Metrics can also be pushed using plugins, in the event hosts are behind a firewall or prohibited from opening ports by security policy.
Monitoring and alerting system that collects metrics from configured targets at given intervals, evaluates rule
expressions, displays the results, and can trigger alerts when specified conditions are observed.<br/>
Metrics can also be pushed using plugins, in the event hosts are behind a firewall or prohibited from opening ports by
security policy.
## Table of contents <!-- omit in toc -->
@@ -9,6 +11,7 @@ Metrics can also be pushed using plugins, in the event hosts are behind a firewa
1. [Extras](#extras)
1. [Configuration](#configuration)
1. [Queries](#queries)
1. [Filter metrics](#filter-metrics)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -17,14 +20,18 @@ Metrics can also be pushed using plugins, in the event hosts are behind a firewa
Prometheus is composed by its **server**, the **Alertmanager** and its **exporters**.
Alerting rules can be created within Prometheus, and configured to send custom alerts to _Alertmanager_.<br/>
Alertmanager then processes and handles the alerts, including sending notifications through different mechanisms or third-party services.
Alertmanager then processes and handles the alerts, including sending notifications through different mechanisms or
third-party services.
The _exporters_ can be libraries, processes, devices, or anything else exposing metrics so that they can be scraped by Prometheus.<br/>
Such metrics are usually made available at the `/metrics` endpoint, which allows them to be scraped directly from Prometheus without the need of an agent.
The _exporters_ can be libraries, processes, devices, or anything else exposing metrics so that they can be scraped by
Prometheus.<br/>
Such metrics are usually made available at the `/metrics` endpoint, which allows them to be scraped directly from
Prometheus without the need of an agent.
### Extras
As welcomed addition, [Grafana] can be configured to use Prometheus as a backend of its in order to provide data visualization and dashboarding functions on the data it provides.
As welcomed addition, [Grafana] can be configured to use Prometheus as a backend of its in order to provide data
visualization and dashboarding functions on the data it provides.
## Configuration
@@ -45,14 +52,20 @@ scrape_configs:
- job_name: router
static_configs:
- targets: [ 'openwrt.local:9100' ]
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: '(node_cpu)'
```
## Queries
Prometheus' query syntax is [PromQL].
All data is stored as time series, each one identified by a metric name, e.g. `node_filesystem_avail_bytes` for available filesystem space.<br/>
Metrics' names can be used in the expressions to select all of the time series with this name and produce an **instant vector**.
All data is stored as time series, each one identified by a metric name, e.g. `node_filesystem_avail_bytes` for
available filesystem space.<br/>
Metrics' names can be used in the expressions to select all of the time series with this name and produce an
**instant vector**.
Time series can be filtered using selectors and labels (sets of key-value pairs):
@@ -77,8 +90,40 @@ When using time ranges, the vector returned will be a **range vector**.
![advanced query](prometheus%20advanced%20query.png)
Labels are used to filter the job and the mode. `node_cpu_seconds_total` returns a **counter**, and the irate() function calculates the **per-second rate of change** based on the last two data points of the range interval.<br/>
To calculate the overall CPU usage, the idle mode of the metric is used. Since idle percent of a processor is the opposite of a busy processor, the irate value is subtracted from 1. To make it a percentage, it is multiplied by 100.
Labels are used to filter the job and the mode. `node_cpu_seconds_total` returns a **counter**, and the irate() function
calculates the **per-second rate of change** based on the last two data points of the range interval.<br/>
To calculate the overall CPU usage, the idle mode of the metric is used. Since idle percent of a processor is the
opposite of a busy processor, the irate value is subtracted from 1. To make it a percentage, it is multiplied by 100.
## Filter metrics
Refer [How relabeling in Prometheus works], [Scrape selective metrics in Prometheus] and
[Dropping metrics at scrape time with Prometheus].
Use [metric relabeling configurations][metric_relabel_configs] to select which series to ingest **after** scraping:
```diff
scrape_configs:
- job_name: router
+ metric_relabel_configs:
+ - # do *not* record metrics which name matches the regex
+ # in this case, those which name starts with 'node_disk_'
+ source_labels: [ __name__ ]
+ action: drop
+ regex: node_disk_.*
- job_name: hosts
+ metric_relabel_configs:
+ - # *only* record metrics which name matches the regex
+ # in this case, those which name starts with 'node_cpu_' with cpu=1 and mode=user
+ source_labels:
+ - __name__
+ - cpu
+ - mode
+ regex: node_cpu_.*1.*user.*
+ action: keep
```
## Further readings
@@ -86,7 +131,7 @@ To calculate the overall CPU usage, the idle mode of the metric is used. Since i
- [Github]
- [`docker/monitoring`][docker/monitoring]
- [Node exporter]
- [SMNP exporter]
- [SNMP exporter]
- [`ordaa/boinc_exporter`][ordaa/boinc_exporter]
- [Grafana]
@@ -100,20 +145,15 @@ All the references in the [further readings] section, plus the following:
- [`prometheus/node_exporter`][prometheus/node_exporter]
- [`prometheus/snmp_exporter`][prometheus/snmp_exporter]
- [How I monitor my OpenWrt router with Grafana Cloud and Prometheus]
- [Scrape selective metrics in Prometheus]
- [Dropping metrics at scrape time with Prometheus]
- [How relabeling in Prometheus works]
<!--
References
Reference
═╬═Time══
-->
<!-- Upstream -->
[functions]: https://prometheus.io/docs/prometheus/latest/querying/functions/
[github]: https://github.com/prometheus/prometheus
[node exporter guide]: https://prometheus.io/docs/guides/node-exporter/
[prometheus/node_exporter]: https://github.com/prometheus/node_exporter
[prometheus/snmp_exporter]: https://github.com/prometheus/snmp_exporter
[promql]: https://prometheus.io/docs/prometheus/latest/querying/basics/
[website]: https://prometheus.io/
<!-- In-article sections -->
[further readings]: #further-readings
@@ -123,10 +163,23 @@ All the references in the [further readings] section, plus the following:
[snmp exporter]: snmp%20exporter.md
<!-- Files -->
[docker/monitoring]: ../docker/monitoring/README.md
[docker/monitoring]: ../containers/monitoring/README.md
<!-- Upstream -->
[functions]: https://prometheus.io/docs/prometheus/latest/querying/functions/
[github]: https://github.com/prometheus/prometheus
[node exporter guide]: https://prometheus.io/docs/guides/node-exporter/
[prometheus/node_exporter]: https://github.com/prometheus/node_exporter
[prometheus/snmp_exporter]: https://github.com/prometheus/snmp_exporter
[promql]: https://prometheus.io/docs/prometheus/latest/querying/basics/
[website]: https://prometheus.io/
[metric_relabel_configs]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs
<!-- Others -->
[dropping metrics at scrape time with prometheus]: https://www.robustperception.io/dropping-metrics-at-scrape-time-with-prometheus/
[getting started with prometheus]: https://opensource.com/article/18/12/introduction-prometheus
[how i monitor my openwrt router with grafana cloud and prometheus]: https://grafana.com/blog/2021/02/09/how-i-monitor-my-openwrt-router-with-grafana-cloud-and-prometheus/
[ordaa/boinc_exporter]: https://gitlab.com/ordaa/boinc_exporter
[scrape selective metrics in prometheus]: https://docs.last9.io/docs/how-to-scrape-only-selective-metrics-in-prometheus
[snmp monitoring and easing it with prometheus]: https://medium.com/@openmohan/snmp-monitoring-and-easing-it-with-prometheus-b157c0a42c0c
[how relabeling in prometheus works]: https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/