16 KiB
Prometheus
Monitoring and alerting system that collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.
Metrics can be pushed using plugins, in the event hosts are behind a firewall or prohibited from opening ports by security policy.
- TL;DR
- Components
- Installation
- Configuration
- Queries
- Storage
- Write to remote Prometheus servers
- Management API
- Further readings
- Sources
TL;DR
# Start the process.
prometheus
prometheus --web.enable-admin-api
# Reload the configuration file without restarting the process.
kill -s 'SIGHUP' '3969'
pkill --signal 'HUP' 'prometheus'
# Shut down the process *gracefully*.
kill -s 'SIGTERM' '3969'
pkill --signal 'TERM' 'prometheus'
Components
Prometheus is composed by its server, the Alertmanager and its exporters.
Alerting rules can be created within Prometheus, and configured to send custom alerts to Alertmanager.
Alertmanager then processes and handles the alerts, including sending notifications through different mechanisms or
third-party services.
The exporters can be libraries, processes, devices, or anything else exposing metrics so that they can be scraped by
Prometheus.
Such metrics are usually made available at the /metrics endpoint, which allows them to be scraped directly from
Prometheus without the need of an agent.
Extras
As welcomed addition, Grafana can be configured to use Prometheus as a backend of its in order to provide data visualization and dashboarding functions on the data it provides.
Installation
brew install 'prometheus'
docker run -p '9090:9090' -v './prometheus.yml:/etc/prometheus/prometheus.yml' --name prometheus -d 'prom/prometheus'
Kubernetes
helm repo add 'prometheus-community' 'https://prometheus-community.github.io/helm-charts'
helm -n 'monitoring' upgrade -i --create-namespace 'prometheus' 'prometheus-community/prometheus'
helm -n 'monitoring' upgrade -i --create-namespace --repo 'https://prometheus-community.github.io/helm-charts' \
'prometheus' 'prometheus'
Access components:
| Component | From within the cluster |
|---|---|
| Prometheus server | prometheus-server.monitoring.svc.cluster.local:80 |
| Alertmanager | prometheus-alertmanager.monitoring.svc.cluster.local:80 |
| Push gateway | prometheus-pushgateway.monitoring.svc.cluster.local:80 |
# Access the prometheus server.
kubectl -n 'monitoring' get pods -l 'app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus' \
-o jsonpath='{.items[0].metadata.name}' \
| xargs -I {} kubectl -n 'monitoring' port-forward {} 9090
# Access alertmanager.
kubectl -n 'monitoring' get pods -l 'app.kubernetes.io/name=alertmanager,app.kubernetes.io/instance=prometheus' \
-o jsonpath='{.items[0].metadata.name}' \
| xargs -I {} kubectl -n 'monitoring' port-forward {} 9093
# Access the push gateway.
kubectl -n 'monitoring' get pods -l -l "app=prometheus-pushgateway,component=pushgateway" \
-o jsonpath='{.items[0].metadata.name}' \
| xargs -I {} kubectl -n 'monitoring' port-forward {} 9091
Configuration
The default configuration file is at /etc/prometheus/prometheus.yml.
Reload the configuration without restarting Prometheus's process by using the SIGHUP signal:
kill -s 'SIGHUP' '3969'
pkill --signal 'HUP' 'prometheus'
global:
scrape_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: [ 'localhost:9090' ]
- job_name: nodes
static_configs:
- targets:
- fqdn:9100
- host.local:9100
- job_name: router
static_configs:
- targets: [ 'openwrt.local:9100' ]
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: '(node_cpu)'
Filter metrics
Refer How relabeling in Prometheus works, Scrape selective metrics in Prometheus and Dropping metrics at scrape time with Prometheus.
Use metric relabeling configurations to select which series to ingest after scraping:
scrape_configs:
- job_name: router
…
+ metric_relabel_configs:
+ - # do *not* record metrics which name matches the regex
+ # in this case, those which name starts with 'node_disk_'
+ source_labels: [ __name__ ]
+ action: drop
+ regex: node_disk_.*
- job_name: hosts
…
+ metric_relabel_configs:
+ - # *only* record metrics which name matches the regex
+ # in this case, those which name starts with 'node_cpu_' with cpu=1 and mode=user
+ source_labels:
+ - __name__
+ - cpu
+ - mode
+ regex: node_cpu_.*1.*user.*
+ action: keep
Queries
Prometheus' query syntax is PromQL.
All data is stored as time series, each one identified by a metric name, e.g. node_filesystem_avail_bytes for
available filesystem space.
Metrics' names can be used in the expressions to select all of the time series with this name and produce an
instant vector.
Time series can be filtered using selectors and labels (sets of key-value pairs):
node_filesystem_avail_bytes{fstype="ext4"}
node_filesystem_avail_bytes{fstype!="xfs"}
Square brackets allow to select a range of samples from the current time backwards:
node_memory_MemAvailable_bytes[5m]
When using time ranges, the vector returned will be a range vector.
Functions can be used to build advanced queries:
100 * (1 - avg by(instance)(irate(node_cpu_seconds_total{job='node_exporter',mode='idle'}[5m])))
Labels are used to filter the job and the mode. node_cpu_seconds_total returns a counter, and the irate() function
calculates the per-second rate of change based on the last two data points of the range interval.
To calculate the overall CPU usage, the idle mode of the metric is used. Since idle percent of a processor is the
opposite of a busy processor, the irate value is subtracted from 1. To make it a percentage, it is multiplied by 100.
Storage
Refer Storage.
Prometheus uses a local on-disk time series database, but can optionally integrate with remote storage systems.
Local storage
Local storage is not clustered nor replicated. This makes it not arbitrarily scalable or durable in the face of
outages.
The use of RAID disks is suggested for storage availability, and snapshots are recommended for backups.
The local storage is not intended to be durable long-term storage and external solutions should be used to achieve extended retention and data durability.
External storage may be used via the remote read/write APIs.
These storage systems vary greatly in durability, performance, and efficiency.
Ingested samples are grouped into blocks of two hours.
Each two-hours block consists of a uniquely named directory. This contains:
- A
chunkssubdirectory, hosting all the time series samples for that window of time.
Samples are grouped into one or more segment files of up to 512MB each by default. - A metadata file.
- An index file.
This indexes metric names and labels to time series in thechunksdirectory.
When series are deleted via the API, deletion records are stored in separate tombstones files and are not deleted
immediately from the chunk segments.
The current block for incoming samples is kept in memory and is not fully persisted.
This is secured against crashes by a write-ahead log (WAL) that can be replayed when the Prometheus server restarts.
Write-ahead log files are stored in the wal directory in segments of 128MB in size.
These files contain raw data that has not yet been compacted.
Prometheus will retain a minimum of three write-ahead log files. Servers may retain more than three WAL files in order
to keep at least two hours of raw data stored.
The server's data directory looks something like follows:
./data
├── 01BKGV7JBM69T2G1BGBGM6KB12
│ └── meta.json
├── 01BKGTZQ1SYQJTR4PB43C8PD98
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
├── 01BKGTZQ1HHWHV8FBJXW1Y3W0K
│ └── meta.json
├── 01BKGV7JC0RY8A6MACW02A2PJD
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
├── chunks_head
│ └── 000001
└── wal
├── 000000002
└── checkpoint.00000001
└── 00000000
The initial two-hour blocks are eventually compacted into longer blocks in the background.
Each block will contain data spanning up to 10% of the retention time or 31 days, whichever is smaller.
The retention time defaults to 15 days.
Expired block cleanup happens in the background. It may take up to two hours to remove expired blocks. Blocks must be
fully expired before they are removed.
Prometheus stores an average of 1-2 bytes per sample.
To plan the capacity of a Prometheus server, one can use the following rough formula:
needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
To lower the rate of ingested samples one can:
- Either reduce the number of time series scraped (fewer targets or fewer series per target)
- Or increase the scrape interval.
Reducing the number of series is likely more effective, due to compression of samples within a series.
If the local storage becomes corrupted for whatever reason, the best strategy is to shut down Prometheus and then remove
the entire storage directory. This means losing all the stored data.
One can alternatively try removing individual block directories or the WAL directory to resolve the problem. Doing so
means losing approximately two hours data per block directory.
Prometheus does not support non-POSIX-compliant filesystems as local storage.
Unrecoverable corruptions may happen.
NFS filesystems (including AWS's EFS) are not supported as, though NFS could be POSIX-compliant, most of its implementations are not.
It is strongly recommended to use a local filesystem for reliability.
If both time and size retention policies are specified, whichever triggers first will take precedence.
External storage
TODO
Backfilling
TODO
Write to remote Prometheus servers
Also see How to set up and experiment with Prometheus remote-write.
The remote server must accept incoming metrics.
One way is to have it start with the --web.enable-remote-write-receiver option.
Use the remote_write setting to configure the sender to forward metrics to the receiver:
remote_write:
- url: http://prometheus.receiver.fqdn:9090/api/v1/write
- url: https://aps-workspaces.eu-east-1.amazonaws.com/workspaces/ws-01234567-abcd-1234-abcd-01234567890a/api/v1/remote_write
queue_config:
max_samples_per_send: 1000
max_shards: 100
capacity: 1500
sigv4:
region: eu-east-1
Management API
Take snapshots of the data
Requires the TSDB APIs to be enabled (
--web.enable-admin-api).
Use the snapshot API endpoint to create snapshots of all current data into snapshots/<datetime>-<rand> under the
TSDB's data directory and return that directory as response.
It will optionally skip including data that is only present in the head block, and which has not yet been compacted to disk.
POST /api/v1/admin/tsdb/snapshot
PUT /api/v1/admin/tsdb/snapshot
URL query parameters:
skip_head=: skip data present in the head block. Optional.
Examples:
$ curl -X 'POST' 'http://localhost:9090/api/v1/admin/tsdb/snapshot'
{
"status": "success",
"data": {
"name": "20171210T211224Z-2be650b6d019eb54"
}
}
The snapshot now exists at <data-dir>/snapshots/20171210T211224Z-2be650b6d019eb54
Further readings
- Website
- Github
- Documentation
- Helm chart
docker/monitoring- Node exporter
- SNMP exporter
ordaa/boinc_exporter- Grafana
Sources
All the references in the further readings section, plus the following:
- Getting started with Prometheus
- Node exporter guide
- SNMP monitoring and easing it with Prometheus
prometheus/node_exporterprometheus/snmp_exporter- How I monitor my OpenWrt router with Grafana Cloud and Prometheus
- Scrape selective metrics in Prometheus
- Dropping metrics at scrape time with Prometheus
- How relabeling in Prometheus works
- Install Prometheus and Grafana with helm 3 on a local machine VM
- Set up prometheus and ingress on kubernetes
- How to integrate Prometheus and Grafana on Kubernetes using Helm
- node-exporter's helm chart's values
- How to set up and experiment with Prometheus remote-write
