chore(loki): run on ecs

This commit is contained in:
Michele Cereda
2025-01-18 00:34:23 +01:00
parent d416a27382
commit 2646f5a92b
12 changed files with 686 additions and 18 deletions

View File

@@ -2,8 +2,11 @@
1. [TL;DR](#tldr) 1. [TL;DR](#tldr)
1. [Resource constraints](#resource-constraints) 1. [Resource constraints](#resource-constraints)
1. [Volumes](#volumes) 1. [Storage](#storage)
1. [EBS](#ebs) 1. [EBS volumes](#ebs-volumes)
1. [EFS volumes](#efs-volumes)
1. [Docker volumes](#docker-volumes)
1. [Bind mounts](#bind-mounts)
1. [Troubleshooting](#troubleshooting) 1. [Troubleshooting](#troubleshooting)
1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task) 1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
@@ -17,7 +20,7 @@ Tasks are a logical construct that model and run one or more containers. Contain
ECS runs tasks as two different launch types: ECS runs tasks as two different launch types:
- On EC2 instances that one owns, manages, and pays for. - On EC2 instances that one owns, manages, and pays for.
- Using Fargate, technically a serverless environment for containers. - Using Fargate, technically a serverless environment for containers
Unless otherwise restricted and capped, containers get access to all the CPU and memory capacity available on the host Unless otherwise restricted and capped, containers get access to all the CPU and memory capacity available on the host
running it. running it.
@@ -158,9 +161,18 @@ the `memoryReservation` value.<br/>
If specifying `memoryReservation`, that value is guaranteed to the container and subtracted from the available memory If specifying `memoryReservation`, that value is guaranteed to the container and subtracted from the available memory
resources for the container instance that the container is placed on. Otherwise, the value of `memory` is used. resources for the container instance that the container is placed on. Otherwise, the value of `memory` is used.
## Volumes ## Storage
### EBS Refer [Storage options for Amazon ECS tasks].
| Volume type | Launch type support | OS support | Persistence | Use cases |
| ---------------- | ------------------- | -------------- | -------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| [EBS volumes] | EC2<br/>Fargate | Linux | _Can_ be persisted when used by a standalone task<br/>Ephemeral when attached to tasks maintained by a service | Transactional workloads |
| [EFS volumes] | EC2<br/>Fargate | Linux | Persistent | Data analytics<br/>Media processing<br/>Content management<br/>Web serving |
| [Docker volumes] | EC2 | Linux, Windows | Persistent | Provide a location for data persistence<br/>Sharing data between containers |
| [Bind mounts] | EC2<br/>Fargate | Linux, Windows | Ephemeral | Data analytics<br/>Media processing<br/>Content management<br/>Web serving |
### EBS volumes
Refer [Use Amazon EBS volumes with Amazon ECS]. Refer [Use Amazon EBS volumes with Amazon ECS].
@@ -183,10 +195,74 @@ termination.
One **cannot** configure EBS volumes for attachment to ECS tasks running on AWS Outposts. One **cannot** configure EBS volumes for attachment to ECS tasks running on AWS Outposts.
### EFS volumes
Refer [Use Amazon EFS volumes with Amazon ECS].
Allows tasks with access to the same EFS volumes to share persistent storage.
Tasks **must**:
- Reference the EFS volumes in the `volumes` attribute of their definition.
- Reference the defined volumes in the `mountPoints` attribute in the containers' specifications.
<details style="padding: 0 0 1em 1em;">
```json
{
"volumes": [{
"name": "myEfsVolume",
"efsVolumeConfiguration": {
"fileSystemId": "fs-1234",
"rootDirectory": "/path/to/my/data",
"transitEncryption": "ENABLED",
"transitEncryptionPort": integer,
"authorizationConfig": {
"accessPointId": "fsap-1234",
"iam": "ENABLED"
}
}
}],
"containerDefinitions": [{
"name": "container-using-efs",
"image": "amazonlinux:2",
"entryPoint": [
"sh",
"-c"
],
"command": [ "ls -la /mount/efs" ],
"mountPoints": [{
"sourceVolume": "myEfsVolume",
"containerPath": "/mount/efs",
"readOnly": true
}]
}]
}
```
</details>
EFS file systems are supported on
- EC2 nodes using ECS-optimized AMI version 20200319 with container agent version 1.38.0.
- Fargate since platform version 1.4.0 or later (Linux).
**Not** supported on external instances.
### Docker volumes
TODO
### Bind mounts
TODO
## Troubleshooting ## Troubleshooting
### Invalid 'cpu' setting for task ### Invalid 'cpu' setting for task
Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and [Resource constraints].
<details> <details>
<summary>Cause</summary> <summary>Cause</summary>
@@ -205,14 +281,15 @@ Specify a supported value for the task CPU and memory in your task definition.
</details> </details>
Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and [Resource constraints].
## Further readings ## Further readings
- [Amazon Web Services] - [Amazon Web Services]
- [Amazon ECS task lifecycle] - [Amazon ECS task lifecycle]
- AWS' [CLI] - AWS' [CLI]
- [Troubleshoot Amazon ECS deployment issues] - [Troubleshoot Amazon ECS deployment issues]
- [Storage options for Amazon ECS tasks]
- [EBS]
- [EFS]
### Sources ### Sources
@@ -223,7 +300,10 @@ Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and
- [Use Amazon EBS volumes with Amazon ECS] - [Use Amazon EBS volumes with Amazon ECS]
- [Attach EBS volume to AWS ECS Fargate] - [Attach EBS volume to AWS ECS Fargate]
- [Guide to Using Amazon EBS with Amazon ECS and AWS Fargate] - [Guide to Using Amazon EBS with Amazon ECS and AWS Fargate]
- [Amazon ECS task definition differences for the Fargate launch type]
- [How Amazon ECS manages CPU and memory resources] - [How Amazon ECS manages CPU and memory resources]
- [Exposing multiple ports for an AWS ECS service]
- [Use Amazon EFS volumes with Amazon ECS]
<!-- <!--
Reference Reference
@@ -231,22 +311,32 @@ Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and
--> -->
<!-- In-article sections --> <!-- In-article sections -->
[bind mounts]: #bind-mounts
[docker volumes]: #docker-volumes
[ebs volumes]: #ebs-volumes
[efs volumes]: #efs-volumes
[resource constraints]: #resource-constraints [resource constraints]: #resource-constraints
<!-- Knowledge base --> <!-- Knowledge base -->
[amazon web services]: README.md [amazon web services]: README.md
[cli]: cli.md [cli]: cli.md
[ebs]: ebs.md
[efs]: efs.md
<!-- Upstream --> <!-- Upstream -->
[amazon ecs task definition differences for the fargate launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html
[amazon ecs task lifecycle]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-lifecycle-explanation.html [amazon ecs task lifecycle]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-lifecycle-explanation.html
[amazon ecs task role]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html [amazon ecs task role]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html
[how amazon ecs manages cpu and memory resources]: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/ [how amazon ecs manages cpu and memory resources]: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/
[how amazon elastic container service works with iam]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security_iam_service-with-iam.html [how amazon elastic container service works with iam]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security_iam_service-with-iam.html
[identity and access management for amazon elastic container service]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam.html [identity and access management for amazon elastic container service]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam.html
[storage options for amazon ecs tasks]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_data_volumes.html
[troubleshoot amazon ecs deployment issues]: https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-ecs.html [troubleshoot amazon ecs deployment issues]: https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-ecs.html
[troubleshoot amazon ecs task definition invalid cpu or memory errors]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html [troubleshoot amazon ecs task definition invalid cpu or memory errors]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
[use amazon ebs volumes with amazon ecs]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ebs-volumes.html [use amazon ebs volumes with amazon ecs]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ebs-volumes.html
[use amazon efs volumes with amazon ecs]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/efs-volumes.html
<!-- Others --> <!-- Others -->
[attach ebs volume to aws ecs fargate]: https://medium.com/@shujaatsscripts/attach-ebs-volume-to-aws-ecs-fargate-e23fea7bb1a7 [attach ebs volume to aws ecs fargate]: https://medium.com/@shujaatsscripts/attach-ebs-volume-to-aws-ecs-fargate-e23fea7bb1a7
[exposing multiple ports for an aws ecs service]: https://medium.com/@faisalsuhail1/exposing-multiple-ports-for-an-aws-ecs-service-64b9821c09e8
[guide to using amazon ebs with amazon ecs and aws fargate]: https://stackpioneers.com/2024/01/12/guide-to-using-amazon-ebs-with-amazon-ecs-and-aws-fargate/ [guide to using amazon ebs with amazon ecs and aws fargate]: https://stackpioneers.com/2024/01/12/guide-to-using-amazon-ebs-with-amazon-ecs-and-aws-fargate/

View File

@@ -0,0 +1,60 @@
# Elastic File System
Serverless file storage for sharing files without the need for provisioning or managing storage capacity and
performance.
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
Built to scale on demand growing and shrinking automatically as files are added and removed.<br/>
Accessible across most types of AWS compute instances, including EC2, ECS, EKS, Lambda, and Fargate.
Supports the NFS v4.0 and v4.1 protocols.
Available file system types:
- _Regional_: redundant across **multiple** geographically separated AZs **within the same Region**.
- _One Zone_: data stored within a **single AZ**, with all the limits it implies.
Default modes:
- _General Purpose Performance_: ideal for latency-sensitive applications.<br/>
Examples: web-serving environments, content-management systems, home directories, and general file serving.
- _Elastic Throughput_: designed to scale throughput performance automatically to meet the needs of workloads' activity.
Provides file-system-access semantics like strong data consistency and file locking.<br/>
Supports controlling access to file systems through POSIX permissions.<br/>
Supports authentication, authorization, and encryption.
EFS supports encryption in transit and encryption at rest.<br/>
Encryption at rest is enabled when creating a file system. In such case, all data and metadata is encrypted.<br/>
Encryption in transit is enabled when mounting a file system. Client access via NFS to EFS is controlled by both IAM
policies and network security policies (i.e. security groups).
Windows-based EC2 instances are **not** supported.
## Further readings
- [Amazon Web Services]
### Sources
- [What is Amazon Elastic File System?]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[amazon web services]: README.md
<!-- Files -->
<!-- Upstream -->
[what is amazon elastic file system?]: https://docs.aws.amazon.com/efs/latest/ug/whatisefs.html
<!-- Others -->

View File

@@ -24,6 +24,8 @@
Using `-H 'PRIVATE-TOKEN: glpat-m-…'` in API calls is the same as using `-H 'Authorization: bearer glpat-m-…'`. Using `-H 'PRIVATE-TOKEN: glpat-m-…'` in API calls is the same as using `-H 'Authorization: bearer glpat-m-…'`.
Use _deploy tokens_ instead of personal access tokens to access repositories in pipelines as they do not expire.
```sh ```sh
# List the current application settings of the GitLab instance. # List the current application settings of the GitLab instance.
curl -H 'PRIVATE-TOKEN: glpat-m-…' 'https://gitlab.fqdn/api/v4/application/settings' curl -H 'PRIVATE-TOKEN: glpat-m-…' 'https://gitlab.fqdn/api/v4/application/settings'

View File

@@ -14,6 +14,12 @@ very cost-effective and easy to operate.
1. [Compactor](#compactor) 1. [Compactor](#compactor)
1. [Ruler](#ruler) 1. [Ruler](#ruler)
1. [Clients](#clients) 1. [Clients](#clients)
1. [Labels](#labels)
1. [Labelling best practices](#labelling-best-practices)
1. [Deployment](#deployment)
1. [Monolithic mode](#monolithic-mode)
1. [Simple scalable mode](#simple-scalable-mode)
1. [Microservices mode](#microservices-mode)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -23,10 +29,10 @@ It indexes **a set of labels** for each log stream instead of the full logs' con
The log data itself is then compressed and stored in chunks in object storage solutions, or locally on the host's The log data itself is then compressed and stored in chunks in object storage solutions, or locally on the host's
filesystem. filesystem.
Loki can be executed in _single binary_ mode, with all its components running simultaneously in one process, or in Can be executed in _single binary_ mode, with all its components running simultaneously in one process, or in
_simple scalable deployment_ mode, which groups components into read, write, and backend parts. _simple scalable deployment_ mode, which groups components into read, write, and backend parts.
Loki's files can be _index_es or _chunk_s.<br/> Files can be _index_es or _chunk_s.<br/>
Indexes are tables of contents in TSDB format of where to find logs for specific sets of labels.<br/> Indexes are tables of contents in TSDB format of where to find logs for specific sets of labels.<br/>
Chunks are containers for log entries for specific sets of labels. Chunks are containers for log entries for specific sets of labels.
@@ -45,6 +51,10 @@ zypper install 'loki'
docker run --name loki -d \ docker run --name loki -d \
-p '3100:3100' -v "$(pwd)/config/loki.yml:/etc/loki/config.yml:ro" \ -p '3100:3100' -v "$(pwd)/config/loki.yml:/etc/loki/config.yml:ro" \
'grafana/loki:3.3.2' -config.file='/etc/loki/config.yml' 'grafana/loki:3.3.2' -config.file='/etc/loki/config.yml'
# Run on Kubernetes in microservices mode.
helm --namespace 'loki' upgrade --create-namespace --install --cleanup-on-fail 'loki' \
--repo 'https://grafana.github.io/helm-charts' 'loki-distributed'
``` ```
Default configuration file for package-based installations is `/etc/loki/config.yml` or `/etc/loki/loki.yaml`. Default configuration file for package-based installations is `/etc/loki/config.yml` or `/etc/loki/loki.yaml`.
@@ -65,9 +75,26 @@ Default configuration file for package-based installations is `/etc/loki/config.
<summary>Usage</summary> <summary>Usage</summary>
```sh ```sh
# Check the server is working. # Verify configuration files
loki -verify-config
loki -config.file='/etc/loki/local-config.yaml' -verify-config
# List available component targets
loki -list-targets
docker run 'docker.io/grafana/loki' -config.file='/etc/loki/local-config.yaml' -list-targets
# Start server components
loki
loki -target='all'
loki -config.file='/etc/loki/config.yaml' -target='read'
# Print the final configuration to stderr and start
loki -print-config-stderr …
# Check the server is working
curl 'http://loki.fqdn:3100/ready' curl 'http://loki.fqdn:3100/ready'
curl 'http://loki.fqdn:3100/metrics' curl 'http://loki.fqdn:3100/metrics'
curl 'http://loki.fqdn:3100/services'
# Check components in Loki clusters are up and running. # Check components in Loki clusters are up and running.
# Such components must run by themselves for this. # Such components must run by themselves for this.
@@ -220,6 +247,155 @@ Multiple rulers will use a consistent hash ring to distribute rule groups amongs
Refer [Send log data to Loki]. Refer [Send log data to Loki].
## Labels
The content of each log line is **not** indexed. Instead, log entries are grouped into streams.<br/>
The streams are then indexed with labels.
Labels are key-value pairs, e.g.:
```plaintext
deployment_environment = development
cloud_region = us-west-1
namespace = grafana-server
```
Sets of log messages that share all the labels above would be called a _log stream_.
Loki has a default limit of 15 index labels.
When Loki performs searches, it:
1. Looks for **all** messages in the chosen stream.
1. Iterates through the logs in the stream to perform the query.
Labelling affects queries, which in turn affect dashboards.
Loki does **not** parse **nor** process log messages on ingestion.<br/>
However, some labels may automatically be applied to logs by the client that collected them.
Loki automatically tries to populate a default `service_name` label while ingesting logs.<br/>
This label is mainly used to find and explore logs in the `Explore Logs` feature of Grafana.
When receiving data from Grafana Alloy or the OpenTelemetry Collector as client, Loki automatically assigns some of the
OTel resource attributes as labels.<br/>
By default, some resource attributes will be stored as labels, with periods (.) replaced with underscores (_). The
remaining attributes are stored as structured metadata with each log entry.
_Cardinality_ is the combination of unique labels and values (how many values can each label have). It impacts the
number of log streams one creates and can lead to significant performance degradation.<br/>
Prefer fewer labels with bounded values.
Loki performs very poorly when labels have high cardinality, as it is forced to build a huge index and flush thousands
of tiny chunks to the object store.
Loki places the same restrictions on label naming as Prometheus:
- They _may_ contain ASCII letters and digits, as well as underscores and colons.<br/>
It must match the `[a-zA-Z_:][a-zA-Z0-9_:]*` regex.
- Unsupported characters shall be converted to an underscore.<br/>
E.g.: `app.kubernetes.io/name` shall be written as `app_kubernetes_io_name`.
- Do **not** begin **nor** end your label names with double underscores.<br/>
This naming convention is used for internal labels, e.g. `_stream_shard_`.<br/>Internal labels are **hidden** by
default in the label browser, query builder, and autocomplete to avoid creating confusion for users.
Prefer **not** adding labels based on the content of the log message.
Loki supports ingesting out-of-order log entries.<br/>
Out-of-order writes are enabled globally by default and can be disabled/enabled on a cluster or per-tenant basis.
Entries in a given log stream (identified by a given set of label names and values) **must be ingested in order**
within the default two hour time window.<br/>
When trying to send entries that are too old for a given log stream, Loki will respond with the `too far behind` error.
Use labels to separate streams so they can be ingested separately:
- When planning to ingest out-of-order log entries.
- For systems with different ingestion delays and shipping.
### Labelling best practices
- Use labels for things like regions, clusters, servers, applications, namespaces, and environments.
<details>
They will be fixed for given systems/apps and have bounded values.<br/>
Static labels like these make it easier to query logs in a logical sense.
</details>
- Avoid adding labels for something until you know you need it.<br/>
Prefer using filter expressions like `|= "text"` or `|~ "regex"` to brute force logs instead.
- Ensure labels have low cardinality. Ideally, limit it to tens of values.
- Prefer using labels with long-lived values.
- Consider extracting often parsed labels from log lines on the client side by attaching it as structured metadata.
- Be aware of dynamic labels applied by clients.
## Deployment
### Monolithic mode
Runs all of Loki's microservice components inside a single process as a single binary or Docker image.
Set the `-target` command line parameter to `all`.
Useful for experimentation, or for small read/write volumes of up to approximately 20GB per day.<br/>
Recommended to use the [Simple scalable mode] if in need to scale the deployment further.
<details>
<summary>Horizontally scale this mode to more instances</summary>
- Use a shared object store.
- Configure the `ring` section of the configuration file to share state between all instances.
</details>
<details>
<summary>Configure high availability</summary>
- Run multiple instances setting up the `memberlist_config` configuration.
- Configure a shared object store
- Configure the `replication_factor` to `3` or more.
This will route traffic to all the Loki instances in a round robin fashion.
</details>
Query parallelization is limited by the number of instances.<br/>
Configure the `max_query_parallelism` setting in the configuration file.
### Simple scalable mode
Default configuration installed by Loki's Helm Chart and the easiest way to deploy Loki at scale.
Requires a reverse proxy to be deployed in front of Loki to direct client's API requests to either the read or write
nodes. The Loki Helm chart deploys a default reverse proxy configuration using [Nginx].
This mode can scale up to a few TBs of logs per day.<br/>
If going over this, recommended to use the [Microservices mode].
Separates execution paths into `read`, `write`, and `backend` targets.<br/>
Targets can be scaled independently.
Execution paths are activated by defining the target on Loki's startup:
- `-target=write`: the `write` target is **stateful** and controlled by a Kubernetes StatefulSet.<br/>
Contains the [distributor] and [ingester] components.
- `-target=read`: the `read` target is **stateless** and _can_ be run as a Kubernetes Deployment.<br/>
In the official helm chart this is currently deployed as a StatefulSet.<br/>
Contains the [query frontend] and [querier] components.
- `-target=backend`: the `backend` target is **stateful** and controlled by a Kubernetes StatefulSet.<br/>
Contains the [compactor], [index gateway], [query scheduler] and [ruler] components.
### Microservices mode
Runs each Loki component as its own distinct processes.<br/>
Each process is invoked specifying its own target.
Designed for Kubernetes deployments and available as the [loki-distributed] community-supported Helm chart.
Only recommended for very large Loki clusters, or when needing more precise control over them.
## Further readings ## Further readings
- [Website] - [Website]
@@ -232,6 +408,9 @@ Refer [Send log data to Loki].
- [Documentation] - [Documentation]
- [HTTP API reference] - [HTTP API reference]
- [How to Set Up Grafana, Loki, and Prometheus Locally with Docker Compose: Part 1 of 3]
- [Deploying Grafana, Loki, and Prometheus on AWS ECS with EFS and Cloud Formation (Part 3 of 3)]
- [Storage - AWS deployment (S3 Single Store)]
<!-- <!--
Reference Reference
@@ -239,8 +418,20 @@ Refer [Send log data to Loki].
--> -->
<!-- In-article sections --> <!-- In-article sections -->
[compactor]: #compactor
[distributor]: #distributor
[index gateway]: #index-gateway
[ingester]: #ingester
[microservices mode]: #microservices-mode
[querier]: #querier
[query frontend]: #query-frontend
[query scheduler]: #query-scheduler
[ruler]: #ruler
[simple scalable mode]: #simple-scalable-mode
<!-- Knowledge base --> <!-- Knowledge base -->
[grafana]: grafana.md [grafana]: grafana.md
[nginx]: nginx.md
[promtail]: promtail.md [promtail]: promtail.md
<!-- Files --> <!-- Files -->
@@ -248,7 +439,11 @@ Refer [Send log data to Loki].
[codebase]: https://github.com/grafana/loki [codebase]: https://github.com/grafana/loki
[documentation]: https://grafana.com/docs/loki/latest/ [documentation]: https://grafana.com/docs/loki/latest/
[http api reference]: https://grafana.com/docs/loki/latest/reference/loki-http-api/ [http api reference]: https://grafana.com/docs/loki/latest/reference/loki-http-api/
[loki-distributed]: https://github.com/grafana/helm-charts/tree/main/charts/loki-distributed
[send log data to loki]: https://grafana.com/docs/loki/latest/send-data/ [send log data to loki]: https://grafana.com/docs/loki/latest/send-data/
[storage - aws deployment (s3 single store)]: https://grafana.com/docs/loki/latest/configure/storage/#aws-deployment-s3-single-store
[website]: https://grafana.com/oss/loki/ [website]: https://grafana.com/oss/loki/
<!-- Others --> <!-- Others -->
[deploying grafana, loki, and prometheus on aws ecs with efs and cloud formation (part 3 of 3)]: https://medium.com/@ahmadbilalch891/deploying-grafana-loki-and-prometheus-on-aws-ecs-with-efs-and-cloud-formation-part-3-of-3-24140ea8ccfb
[how to set up grafana, loki, and prometheus locally with docker compose: part 1 of 3]: https://medium.com/@ahmadbilalch891/how-to-set-up-grafana-loki-and-prometheus-locally-with-docker-compose-part-1-of-3-62fb25e51d92

View File

@@ -25,4 +25,5 @@ AWS_PROFILE='engineer' aws sts get-caller-identity
# Run as Docker container # Run as Docker container
docker run --rm -ti 'amazon/aws-cli' --version
docker run --rm -ti -v "$HOME/.aws:/root/.aws:ro" 'amazon/aws-cli:2.17.16' autoscaling describe-auto-scaling-groups docker run --rm -ti -v "$HOME/.aws:/root/.aws:ro" 'amazon/aws-cli:2.17.16' autoscaling describe-auto-scaling-groups

View File

@@ -4,6 +4,20 @@ alias aws-caller-info 'aws sts get-caller-identity'
alias aws-ssm 'aws ssm start-session --target' alias aws-ssm 'aws ssm start-session --target'
alias aws-whoami 'aws-caller-info' alias aws-whoami 'aws-caller-info'
function aws-alb-privateDnsName-from-name
aws ec2 describe-network-interfaces --output 'text' \
--query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateDnsName' \
--filters Name='description',Values="ELB app/$argv[1]/*"
end
function aws-alb-privateIps-from-name
aws ec2 describe-network-interfaces --output 'text' \
--query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' \
--filters Name='description',Values="ELB app/$argv[1]/*"
end
function aws-assume-role-by-name function aws-assume-role-by-name
set current_caller (aws-caller-info --output json | jq -r '.UserId' -) set current_caller (aws-caller-info --output json | jq -r '.UserId' -)
aws-iam-role-arn-from-name "$argv[1]" \ aws-iam-role-arn-from-name "$argv[1]" \
@@ -20,7 +34,31 @@ function aws-ec2-instanceId-from-nameTag
--query 'Reservations[].Instances[0].InstanceId' --query 'Reservations[].Instances[0].InstanceId'
end end
function aws-iam-role-arn-from-name function aws-ec2-nameTag-from-instanceId
aws ec2 describe-instances --output 'text' \
--filters "Name=instance-id,Values=$argv[1]" \
--query "Reservations[].Instances[0].Tags[?(@.Key=='Name')].Value"
end
function aws-ec2-tag-from-instanceId
aws ec2 describe-instances --output 'text' \
--filters "Name=instance-id,Values=$argv[1]" \
--query "Reservations[].Instances[0].Tags[?(@.Key=='$argv[2]')].Value"
end
function aws-ec2-tags-from-instanceId
aws ec2 describe-instances --output 'table' \
--filters "Name=instance-id,Values=$argv[1]" \
--query 'Reservations[].Instances[0].Tags[]'
end
function aws-ecs-tasks-from-clusterName-and-serviceName
aws ecs list-tasks --cluster "$argv[1]" --output 'text' --query 'taskArns' \
| xargs aws ecs describe-tasks --cluster "$argv[1]" \
--query "tasks[?group.contains(@, '$argv[2]')]" --tasks
end
function aws-iam-roleArn-from-name
aws iam list-roles --output 'text' \ aws iam list-roles --output 'text' \
--query "Roles[?RoleName == '$argv[1]'].Arn" --query "Roles[?RoleName == '$argv[1]'].Arn"
end end
@@ -35,12 +73,6 @@ function aws-iam-user-owning-accessKey
| jq -rs 'flatten|first' | jq -rs 'flatten|first'
end end
aws iam list-users --no-cli-pager --query 'Users[].UserName' --output 'text' | xargs -n '1' | shuf \
| xargs -n 1 -P (nproc) aws iam list-access-keys --output 'json' \
--query "AccessKeyMetadata[?AccessKeyId=='$argv[1]'].UserName" --user \
| jq -rs 'flatten|first'
end
alias aws-ec2-running-instanceIds "aws ec2 describe-instances --output 'text' \ alias aws-ec2-running-instanceIds "aws ec2 describe-instances --output 'text' \
--filters 'Name=instance-state-name,Values=running' \ --filters 'Name=instance-state-name,Values=running' \

View File

@@ -89,6 +89,10 @@ aws ecs list-tasks --query 'taskArns' --output 'text' --cluster 'testCluster' --
| tee \ | tee \
| xargs -I{} curl -fs "http://{}:8080" | xargs -I{} curl -fs "http://{}:8080"
# Describe tasks given a service name
aws ecs list-tasks --cluster 'testCluster' --output 'text' --query 'taskArns' \
| xargs aws ecs describe-tasks --cluster 'testCluster' --query "tasks[?group.contains(@, 'serviceName')]" --output 'yaml' --tasks
# Show information about services # Show information about services
aws ecs describe-services --cluster 'stg' --services 'grafana' aws ecs describe-services --cluster 'stg' --services 'grafana'
@@ -186,6 +190,15 @@ aws iam list-users --no-cli-pager --query 'Users[].UserName' --output 'text' \
aws iam --no-cli-pager list-access-keys aws iam --no-cli-pager list-access-keys
aws iam --no-cli-pager list-access-keys --user-name 'mark' aws iam --no-cli-pager list-access-keys --user-name 'mark'
# Change users' console password
# FIXME: check
aws iam update-login-profile --user-name 'logan'
aws iam update-login-profile --user-name 'mike' --password 'newPassword' --password-reset-require
# Change one's own console password
# FIXME: check
basename (aws sts get-caller-identity --query 'Arn' --output 'text') \
| xargs aws iam update-login-profile --user-name
### ###
# Image Builder # Image Builder

View File

@@ -24,3 +24,6 @@ docker save 'local/image:latest' | ssh -C 'user@remote.host' docker load
docker inspect 'ghcr.io/jqlang/jq:latest' # image docker inspect 'ghcr.io/jqlang/jq:latest' # image
docker inspect 'host' # network docker inspect 'host' # network
docker inspect 'prometheus-1' # container docker inspect 'prometheus-1' # container
# Install compose directly from package
dnf install 'https://download.docker.com/linux/fedora/41/aarch64/stable/Packages/docker-compose-plugin-2.32.1-1.fc41.aarch64.rpm'

View File

@@ -47,6 +47,7 @@ git reset --soft HEAD~1 # or `git reset --soft HEAD^`
git restore --staged '.lefthook-local.yml' # or `git reset HEAD '.lefthook-local.yml'` git restore --staged '.lefthook-local.yml' # or `git reset HEAD '.lefthook-local.yml'`
git commit -c ORIG_HEAD git commit -c ORIG_HEAD
## ##
# Change the default branch from 'master' to 'main'. # Change the default branch from 'master' to 'main'.
# -------------------------------------- # --------------------------------------
@@ -76,6 +77,12 @@ git format-patch HEAD~1 --stdout
# create patches from specific commits # create patches from specific commits
git format-patch -1 '3918a1d036e74d47a5c830e4bbabba6f507162b1' git format-patch -1 '3918a1d036e74d47a5c830e4bbabba6f507162b1'
###
# Take actions on multiple repositories
# --------------------------------------
###
git-all () { git-all () {
[[ -n $DEBUG ]] && set -o xtrace [[ -n $DEBUG ]] && set -o xtrace
@@ -103,7 +110,12 @@ git-all () {
[[ -n $DEBUG ]] && set +o xtrace [[ -n $DEBUG ]] && set +o xtrace
} }
# Reset fork to upstream's state
###
# Reset forks to their upstream's state
# --------------------------------------
###
git remote add 'upstream' '/url/to/original/repo' git remote add 'upstream' '/url/to/original/repo'
git fetch 'upstream' git fetch 'upstream'
git checkout 'master' git checkout 'master'

View File

@@ -1,5 +1,35 @@
#!/usr/bin/env sh #!/usr/bin/env sh
# Verify configuration files
loki -verify-config
loki -config.file='/etc/loki/local-config.yaml' -verify-config
# List available component targets
loki -list-targets
docker run 'docker.io/grafana/loki' -config.file='/etc/loki/local-config.yaml' -list-targets
# Start server components
loki
loki -target='all'
loki -config.file='/etc/loki/config.yaml' -target='read'
# Run on EKS in microservices mode
helm repo add 'grafana' 'https://grafana.github.io/helm-charts' --force-update
helm search repo --versions 'grafana/loki-distributed'
docker pull '012345678901.dkr.ecr.eu-west-1.amazonaws.com/grafana/loki:2.9.10'
helm --namespace 'loki' diff upgrade --install 'loki' \
--repo 'https://grafana.github.io/helm-charts' 'loki-distributed' --version '0.80.0' \
--values 'values.yml' --set 'loki.image.registry'='012345678901.dkr.ecr.eu-west-1.amazonaws.com'
helm --namespace 'loki' upgrade --create-namespace --install --cleanup-on-fail 'loki' \
--repo 'https://grafana.github.io/helm-charts' 'loki-distributed' --version '0.80.0' \
--values 'values.yml' --set 'loki.image.registry'='012345678901.dkr.ecr.eu-west-1.amazonaws.com' \
--set 'loki.storageConfig.aws.s3'='s3://eu-west-1' --set 'loki.storageConfig.aws.bucketnames'='loki-data' \
--set 'loki.storageConfig.boltdb_shipper.shared_store'='s3'
# Print the final configuration to stderr and start
loki -print-config-stderr …
# Check the server is working # Check the server is working
curl 'http://loki.fqdn:3100/ready' curl 'http://loki.fqdn:3100/ready'
curl 'http://loki.fqdn:3100/metrics' curl 'http://loki.fqdn:3100/metrics'
curl 'http://loki.fqdn:3100/services'

View File

@@ -24,3 +24,8 @@ curl 'http://promtail.fqdn:9080/metrics'
# Connect to the web server # Connect to the web server
open 'http://promtail.fqdn:9080/' open 'http://promtail.fqdn:9080/'
open 'http://promtail.fqdn:9080/targets'
open 'http://promtail.fqdn:9080/service-discovery'
# Inspect pipeline's stages
cat 'file.log' | promtail --stdin --dry-run --inspect --client.url 'http://loki.fqdn:3100/loki/api/v1/push'

View File

@@ -0,0 +1,225 @@
import * as aws from "@pulumi/aws";
const vpc_output = aws.ec2.getVpcOutput({
filters: [{
name: "tag:Name",
values: "Default",
}],
});
const dnsZone_output = aws.route53.getZoneOutput({ name: "example.org." });
const ecsCluster_output = aws.ecs.getClusterOutput({ clusterName: "someCluster" });
const securityGroup = new aws.ec2.SecurityGroup(
"loki",
{
vpcId: vpc_output.apply((vpc: aws.ec2.Vpc) => vpc.id),
name: "Loki",
description: "Manage access to and from Loki",
tags: {
Name: "Loki",
Application: "Loki",
},
ingress: [
{
description: "HTTP server",
cidrBlocks: [ "0.0.0.0/0" ],
ipv6CidrBlocks: [ "::/0" ],
protocol: "tcp",
fromPort: 3100,
toPort: 3100,
},
{
description: "gRPC server",
cidrBlocks: [ "0.0.0.0/0" ],
ipv6CidrBlocks: [ "::/0" ],
protocol: "tcp",
fromPort: 9095,
toPort: 9095,
},
],
egress: [{
description: "Allow all",
cidrBlocks: [ "0.0.0.0/0" ],
ipv6CidrBlocks: [ "::/0" ],
protocol: "-1",
fromPort: 0,
toPort: 0,
}],
},
);
const taskDefinition = new aws.ecs.TaskDefinition(
"loki",
{
family: "Loki",
tags: { Application: "Loki" },
networkMode: "awsvpc",
requiresCompatibilities: [ "FARGATE" ],
cpu: "256", // Fargate requirement
memory: "512", // Fargate requirement
executionRoleArn: "arn:aws:iam::012345678901:role/ecsTaskExecutionRole", // logging requirement
containerDefinitions: JSON.stringify([{
name: "loki",
image: "grafana/loki:3.3.2",
essential: true,
environment: [], // specified to avoid showing changes on every run
volumesFrom: [], // specified to avoid showing changes on every run
mountPoints: [], // specified to avoid showing changes on every run
systemControls: [], // specified to avoid showing changes on every run
healthCheck: {
command: [
"CMD-SHELL",
"wget -qO- localhost:3100/ready || exit 1",
],
startPeriod: 15,
},
portMappings: [
{
name: "http-server",
appProtocol: "http",
containerPort: 3100,
},
{
name: "grpc-server",
appProtocol: "grpc",
containerPort: 9095,
},
],
logConfiguration: {
logDriver: "awslogs",
options: {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/loki",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs",
},
},
}]),
},
);
const privateSubnets_output = aws.ec2.getSubnetOutput({
filters: [{
name: "tag:Type",
values: [ "Private" ],
}],
});
const targetGroup_http = new aws.alb.TargetGroup(
"loki-http",
{
vpcId: vpc_output.apply((vpc: aws.ec2.Vpc) => vpc.id),
name: "loki-http",
tags: { Application: "Loki" },
targetType: "ip",
ipAddressType: "ipv4",
protocol: "HTTP",
port: 3100,
healthCheck: {
path: "/ready",
},
},
);
const targetGroup_grpc = new aws.alb.TargetGroup(
"loki-grpc",
{
vpcId: vpc_output.apply((vpc: aws.ec2.Vpc) => vpc.id),
name: "loki-grpc",
tags: { Application: "Loki" },
targetType: "ip",
ipAddressType: "ipv4",
protocol: "HTTP",
protocolVersion: "GRPC",
port: 9095,
},
);
const loadBalancer = new aws.alb.LoadBalancer(
"loki",
{
name: "Loki",
tags: { Application: "Loki" },
internal: true,
ipAddressType: "ipv4",
subnets: privateSubnets_output.apply((subnets: aws.ec2.Subnet[]) => subnets.map(subnet => subnet.id)),
securityGroups: [ securityGroup.id ],
accessLogs: {
bucket: "",
},
},
);
new aws.route53.Record(
"loki",
{
zoneId: dnsZone_output.apply((zone: aws.route53.Zone) => zone.zoneId),
name: "loki.example.org",
type: "A",
aliases: [{
name: loadBalancer.dnsName,
zoneId: loadBalancer.zoneId,
evaluateTargetHealth: true,
}],
},
);
new aws.alb.Listener(
"loki-http",
{
tags: { Application: "Loki" },
loadBalancerArn: loadBalancer.arn,
port: 3100,
protocol: "HTTP",
defaultActions: [{
order: 1,
targetGroupArn: targetGroup_http.arn,
type: "forward",
}],
},
);
// new aws.alb.Listener(
// FIXME: Listener protocol 'HTTP' is not supported with a target group with the protocol-version 'GRPC'
// "loki-grpc",
// {
// tags: { Application: "Loki" },
// loadBalancerArn: loadBalancer.arn,
// port: 9095,
// protocol: "HTTP",
// defaultActions: [{
// order: 1,
// targetGroupArn: targetGroup_grpc.arn,
// type: "forward",
// }],
// },
// );
new aws.ecs.Service(
"loki",
{
name: "Loki",
tags: { Application: "Loki" },
cluster: ecsCluster_output.arn,
taskDefinition: taskDefinition.arn,
desiredCount: 1,
launchType: "FARGATE",
networkConfiguration: {
subnets: privateSubnets_output.apply((subnets: aws.ec2.Subnet[]) => subnets.map(subnet => subnet.id)),
securityGroups: [ securityGroup.id ],
},
loadBalancers: [
{
containerName: "loki",
containerPort: 3100,
targetGroupArn: targetGroup_http.arn,
},
// {
// containerName: "loki",
// containerPort: 9095,
// targetGroupArn: targetGroup_grpc.arn,
// },
],
},
);