From 03c11fd54a6f15a7a1da369891f54e22d540e4f1 Mon Sep 17 00:00:00 2001
From: Michele Cereda <cereda.michele@gmail.com>
Date: Tue, 4 Jun 2024 00:24:41 +0200
Subject: [PATCH] feat(kb/prometheus): filter out metrics

---
 .vscode/settings.json        |  3 ++
 knowledge base/prometheus.md | 97 ++++++++++++++++++++++++++++--------
 2 files changed, 78 insertions(+), 22 deletions(-)
diff --git a/.vscode/settings.json b/.vscode/settings.json
index 0a4a5e1..f65bec0 100644
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -65,6 +65,7 @@
         "adduser",
         "airgap",
         "airgapped",
+        "alertmanager",
         "apiserver",
         "asciicast",
         "asciinema",
@@ -214,6 +215,7 @@
         "openmediavault",
         "openpgp",
         "opentofu",
+        "openwrt",
         "opkg",
         "ossec",
         "pacman",
@@ -264,6 +266,7 @@
         "setfattr",
         "siem",
         "slurm",
+        "snmp",
         "spiffe",
         "sshfs",
         "sshpass",
diff --git a/knowledge base/prometheus.md b/knowledge base/prometheus.md
index 11f02ad..616df90 100644
--- a/knowledge base/prometheus.md	
+++ b/knowledge base/prometheus.md	
@@ -1,7 +1,9 @@
 # Prometheus
 
-Monitoring and alerting system that collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.<br/>
-Metrics can also be pushed using plugins, in the event hosts are behind a firewall or prohibited from opening ports by security policy.
+Monitoring and alerting system that collects metrics from configured targets at given intervals, evaluates rule
+expressions, displays the results, and can trigger alerts when specified conditions are observed.<br/>
+Metrics can also be pushed using plugins, in the event hosts are behind a firewall or prohibited from opening ports by
+security policy.
 
 ## Table of contents <!-- omit in toc -->
 
@@ -9,6 +11,7 @@ Metrics can also be pushed using plugins, in the event hosts are behind a firewa
    1. [Extras](#extras)
 1. [Configuration](#configuration)
 1. [Queries](#queries)
+1. [Filter metrics](#filter-metrics)
 1. [Further readings](#further-readings)
 1. [Sources](#sources)
 
@@ -17,14 +20,18 @@ Metrics can also be pushed using plugins, in the event hosts are behind a firewa
 Prometheus is composed by its **server**, the **Alertmanager** and its **exporters**.
 
 Alerting rules can be created within Prometheus, and configured to send custom alerts to _Alertmanager_.<br/>
-Alertmanager then processes and handles the alerts, including sending notifications through different mechanisms or third-party services.
+Alertmanager then processes and handles the alerts, including sending notifications through different mechanisms or
+third-party services.
 
-The _exporters_ can be libraries, processes, devices, or anything else exposing metrics so that they can be scraped by Prometheus.<br/>
-Such metrics are usually made available at the `/metrics` endpoint, which allows them to be scraped directly from Prometheus without the need of an agent.
+The _exporters_ can be libraries, processes, devices, or anything else exposing metrics so that they can be scraped by
+Prometheus.<br/>
+Such metrics are usually made available at the `/metrics` endpoint, which allows them to be scraped directly from
+Prometheus without the need of an agent.
 
 ### Extras
 
-As welcomed addition, [Grafana] can be configured to use Prometheus as a backend of its in order to provide data visualization and dashboarding functions on the data it provides.
+As welcomed addition, [Grafana] can be configured to use Prometheus as a backend of its in order to provide data
+visualization and dashboarding functions on the data it provides.
 
 ## Configuration
 
@@ -45,14 +52,20 @@ scrape_configs:
   - job_name: router
     static_configs:
       - targets: [ 'openwrt.local:9100' ]
+    metric_relabel_configs:
+      - source_labels: [__name__]
+        action: keep
+        regex: '(node_cpu)'
 ```
 
 ## Queries
 
 Prometheus' query syntax is [PromQL].
 
-All data is stored as time series, each one identified by a metric name, e.g. `node_filesystem_avail_bytes` for available filesystem space.<br/>
-Metrics' names can be used in the expressions to select all of the time series with this name and produce an **instant vector**.
+All data is stored as time series, each one identified by a metric name, e.g. `node_filesystem_avail_bytes` for
+available filesystem space.<br/>
+Metrics' names can be used in the expressions to select all of the time series with this name and produce an
+**instant vector**.
 
 Time series can be filtered using selectors and labels (sets of key-value pairs):
 
@@ -77,8 +90,40 @@ When using time ranges, the vector returned will be a **range vector**.
 
 ![advanced query](prometheus%20advanced%20query.png)
 
-Labels are used to filter the job and the mode. `node_cpu_seconds_total` returns a **counter**, and the irate() function calculates the **per-second rate of change** based on the last two data points of the range interval.<br/>
-To calculate the overall CPU usage, the idle mode of the metric is used. Since idle percent of a processor is the opposite of a busy processor, the irate value is subtracted from 1. To make it a percentage, it is multiplied by 100.
+Labels are used to filter the job and the mode. `node_cpu_seconds_total` returns a **counter**, and the irate() function
+calculates the **per-second rate of change** based on the last two data points of the range interval.<br/>
+To calculate the overall CPU usage, the idle mode of the metric is used. Since idle percent of a processor is the
+opposite of a busy processor, the irate value is subtracted from 1. To make it a percentage, it is multiplied by 100.
+
+## Filter metrics
+
+Refer [How relabeling in Prometheus works], [Scrape selective metrics in Prometheus] and
+[Dropping metrics at scrape time with Prometheus].
+
+Use [metric relabeling configurations][metric_relabel_configs] to select which series to ingest **after** scraping:
+
+```diff
+ scrape_configs:
+   - job_name: router
+     …
++    metric_relabel_configs:
++      - # do *not* record metrics which name matches the regex
++        # in this case, those which name starts with 'node_disk_'
++        source_labels: [ __name__ ]
++        action: drop
++        regex: node_disk_.*
+   - job_name: hosts
+     …
++    metric_relabel_configs:
++      - # *only* record metrics which name matches the regex
++        # in this case, those which name starts with 'node_cpu_' with cpu=1 and mode=user
++        source_labels:
++          - __name__
++          - cpu
++          - mode
++        regex: node_cpu_.*1.*user.*
++        action: keep
+```
 
 ## Further readings
 
@@ -86,7 +131,7 @@ To calculate the overall CPU usage, the idle mode of the metric is used. Since i
 - [Github]
 - [`docker/monitoring`][docker/monitoring]
 - [Node exporter]
-- [SMNP exporter]
+- [SNMP exporter]
 - [`ordaa/boinc_exporter`][ordaa/boinc_exporter]
 - [Grafana]
 
@@ -100,20 +145,15 @@ All the references in the [further readings] section, plus the following:
 - [`prometheus/node_exporter`][prometheus/node_exporter]
 - [`prometheus/snmp_exporter`][prometheus/snmp_exporter]
 - [How I monitor my OpenWrt router with Grafana Cloud and Prometheus]
+- [Scrape selective metrics in Prometheus]
+- [Dropping metrics at scrape time with Prometheus]
+- [How relabeling in Prometheus works]
 
 <!--
-  References
+  Reference
+  ═╬═Time══
   -->
 
-<!-- Upstream -->
-[functions]: https://prometheus.io/docs/prometheus/latest/querying/functions/
-[github]: https://github.com/prometheus/prometheus
-[node exporter guide]: https://prometheus.io/docs/guides/node-exporter/
-[prometheus/node_exporter]: https://github.com/prometheus/node_exporter
-[prometheus/snmp_exporter]: https://github.com/prometheus/snmp_exporter
-[promql]: https://prometheus.io/docs/prometheus/latest/querying/basics/
-[website]: https://prometheus.io/
-
 <!-- In-article sections -->
 [further readings]: #further-readings
 
@@ -123,10 +163,23 @@ All the references in the [further readings] section, plus the following:
 [snmp exporter]: snmp%20exporter.md
 
 <!-- Files -->
-[docker/monitoring]: ../docker/monitoring/README.md
+[docker/monitoring]: ../containers/monitoring/README.md
+
+<!-- Upstream -->
+[functions]: https://prometheus.io/docs/prometheus/latest/querying/functions/
+[github]: https://github.com/prometheus/prometheus
+[node exporter guide]: https://prometheus.io/docs/guides/node-exporter/
+[prometheus/node_exporter]: https://github.com/prometheus/node_exporter
+[prometheus/snmp_exporter]: https://github.com/prometheus/snmp_exporter
+[promql]: https://prometheus.io/docs/prometheus/latest/querying/basics/
+[website]: https://prometheus.io/
+[metric_relabel_configs]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs
 
 <!-- Others -->
+[dropping metrics at scrape time with prometheus]: https://www.robustperception.io/dropping-metrics-at-scrape-time-with-prometheus/
 [getting started with prometheus]: https://opensource.com/article/18/12/introduction-prometheus
 [how i monitor my openwrt router with grafana cloud and prometheus]: https://grafana.com/blog/2021/02/09/how-i-monitor-my-openwrt-router-with-grafana-cloud-and-prometheus/
 [ordaa/boinc_exporter]: https://gitlab.com/ordaa/boinc_exporter
+[scrape selective metrics in prometheus]: https://docs.last9.io/docs/how-to-scrape-only-selective-metrics-in-prometheus
 [snmp monitoring and easing it with prometheus]: https://medium.com/@openmohan/snmp-monitoring-and-easing-it-with-prometheus-b157c0a42c0c
+[how relabeling in prometheus works]: https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/