mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-08 21:34:25 +00:00
chore(loki): run on ecs
This commit is contained in:
@@ -2,8 +2,11 @@
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Resource constraints](#resource-constraints)
|
||||
1. [Volumes](#volumes)
|
||||
1. [EBS](#ebs)
|
||||
1. [Storage](#storage)
|
||||
1. [EBS volumes](#ebs-volumes)
|
||||
1. [EFS volumes](#efs-volumes)
|
||||
1. [Docker volumes](#docker-volumes)
|
||||
1. [Bind mounts](#bind-mounts)
|
||||
1. [Troubleshooting](#troubleshooting)
|
||||
1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task)
|
||||
1. [Further readings](#further-readings)
|
||||
@@ -17,7 +20,7 @@ Tasks are a logical construct that model and run one or more containers. Contain
|
||||
ECS runs tasks as two different launch types:
|
||||
|
||||
- On EC2 instances that one owns, manages, and pays for.
|
||||
- Using Fargate, technically a serverless environment for containers.
|
||||
- Using Fargate, technically a serverless environment for containers
|
||||
|
||||
Unless otherwise restricted and capped, containers get access to all the CPU and memory capacity available on the host
|
||||
running it.
|
||||
@@ -158,9 +161,18 @@ the `memoryReservation` value.<br/>
|
||||
If specifying `memoryReservation`, that value is guaranteed to the container and subtracted from the available memory
|
||||
resources for the container instance that the container is placed on. Otherwise, the value of `memory` is used.
|
||||
|
||||
## Volumes
|
||||
## Storage
|
||||
|
||||
### EBS
|
||||
Refer [Storage options for Amazon ECS tasks].
|
||||
|
||||
| Volume type | Launch type support | OS support | Persistence | Use cases |
|
||||
| ---------------- | ------------------- | -------------- | -------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
|
||||
| [EBS volumes] | EC2<br/>Fargate | Linux | _Can_ be persisted when used by a standalone task<br/>Ephemeral when attached to tasks maintained by a service | Transactional workloads |
|
||||
| [EFS volumes] | EC2<br/>Fargate | Linux | Persistent | Data analytics<br/>Media processing<br/>Content management<br/>Web serving |
|
||||
| [Docker volumes] | EC2 | Linux, Windows | Persistent | Provide a location for data persistence<br/>Sharing data between containers |
|
||||
| [Bind mounts] | EC2<br/>Fargate | Linux, Windows | Ephemeral | Data analytics<br/>Media processing<br/>Content management<br/>Web serving |
|
||||
|
||||
### EBS volumes
|
||||
|
||||
Refer [Use Amazon EBS volumes with Amazon ECS].
|
||||
|
||||
@@ -183,10 +195,74 @@ termination.
|
||||
|
||||
One **cannot** configure EBS volumes for attachment to ECS tasks running on AWS Outposts.
|
||||
|
||||
### EFS volumes
|
||||
|
||||
Refer [Use Amazon EFS volumes with Amazon ECS].
|
||||
|
||||
Allows tasks with access to the same EFS volumes to share persistent storage.
|
||||
|
||||
Tasks **must**:
|
||||
|
||||
- Reference the EFS volumes in the `volumes` attribute of their definition.
|
||||
- Reference the defined volumes in the `mountPoints` attribute in the containers' specifications.
|
||||
|
||||
<details style="padding: 0 0 1em 1em;">
|
||||
|
||||
```json
|
||||
{
|
||||
"volumes": [{
|
||||
"name": "myEfsVolume",
|
||||
"efsVolumeConfiguration": {
|
||||
"fileSystemId": "fs-1234",
|
||||
"rootDirectory": "/path/to/my/data",
|
||||
"transitEncryption": "ENABLED",
|
||||
"transitEncryptionPort": integer,
|
||||
"authorizationConfig": {
|
||||
"accessPointId": "fsap-1234",
|
||||
"iam": "ENABLED"
|
||||
}
|
||||
}
|
||||
}],
|
||||
"containerDefinitions": [{
|
||||
"name": "container-using-efs",
|
||||
"image": "amazonlinux:2",
|
||||
"entryPoint": [
|
||||
"sh",
|
||||
"-c"
|
||||
],
|
||||
"command": [ "ls -la /mount/efs" ],
|
||||
"mountPoints": [{
|
||||
"sourceVolume": "myEfsVolume",
|
||||
"containerPath": "/mount/efs",
|
||||
"readOnly": true
|
||||
}]
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
EFS file systems are supported on
|
||||
|
||||
- EC2 nodes using ECS-optimized AMI version 20200319 with container agent version 1.38.0.
|
||||
- Fargate since platform version 1.4.0 or later (Linux).
|
||||
|
||||
**Not** supported on external instances.
|
||||
|
||||
### Docker volumes
|
||||
|
||||
TODO
|
||||
|
||||
### Bind mounts
|
||||
|
||||
TODO
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Invalid 'cpu' setting for task
|
||||
|
||||
Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and [Resource constraints].
|
||||
|
||||
<details>
|
||||
<summary>Cause</summary>
|
||||
|
||||
@@ -205,14 +281,15 @@ Specify a supported value for the task CPU and memory in your task definition.
|
||||
|
||||
</details>
|
||||
|
||||
Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and [Resource constraints].
|
||||
|
||||
## Further readings
|
||||
|
||||
- [Amazon Web Services]
|
||||
- [Amazon ECS task lifecycle]
|
||||
- AWS' [CLI]
|
||||
- [Troubleshoot Amazon ECS deployment issues]
|
||||
- [Storage options for Amazon ECS tasks]
|
||||
- [EBS]
|
||||
- [EFS]
|
||||
|
||||
### Sources
|
||||
|
||||
@@ -223,7 +300,10 @@ Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and
|
||||
- [Use Amazon EBS volumes with Amazon ECS]
|
||||
- [Attach EBS volume to AWS ECS Fargate]
|
||||
- [Guide to Using Amazon EBS with Amazon ECS and AWS Fargate]
|
||||
- [Amazon ECS task definition differences for the Fargate launch type]
|
||||
- [How Amazon ECS manages CPU and memory resources]
|
||||
- [Exposing multiple ports for an AWS ECS service]
|
||||
- [Use Amazon EFS volumes with Amazon ECS]
|
||||
|
||||
<!--
|
||||
Reference
|
||||
@@ -231,22 +311,32 @@ Refer [Troubleshoot Amazon ECS task definition invalid CPU or memory errors] and
|
||||
-->
|
||||
|
||||
<!-- In-article sections -->
|
||||
[bind mounts]: #bind-mounts
|
||||
[docker volumes]: #docker-volumes
|
||||
[ebs volumes]: #ebs-volumes
|
||||
[efs volumes]: #efs-volumes
|
||||
[resource constraints]: #resource-constraints
|
||||
|
||||
<!-- Knowledge base -->
|
||||
[amazon web services]: README.md
|
||||
[cli]: cli.md
|
||||
[ebs]: ebs.md
|
||||
[efs]: efs.md
|
||||
|
||||
<!-- Upstream -->
|
||||
[amazon ecs task definition differences for the fargate launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html
|
||||
[amazon ecs task lifecycle]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-lifecycle-explanation.html
|
||||
[amazon ecs task role]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html
|
||||
[how amazon ecs manages cpu and memory resources]: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/
|
||||
[how amazon elastic container service works with iam]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security_iam_service-with-iam.html
|
||||
[identity and access management for amazon elastic container service]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam.html
|
||||
[storage options for amazon ecs tasks]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_data_volumes.html
|
||||
[troubleshoot amazon ecs deployment issues]: https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-ecs.html
|
||||
[troubleshoot amazon ecs task definition invalid cpu or memory errors]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
|
||||
[use amazon ebs volumes with amazon ecs]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ebs-volumes.html
|
||||
[use amazon efs volumes with amazon ecs]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/efs-volumes.html
|
||||
|
||||
<!-- Others -->
|
||||
[attach ebs volume to aws ecs fargate]: https://medium.com/@shujaatsscripts/attach-ebs-volume-to-aws-ecs-fargate-e23fea7bb1a7
|
||||
[exposing multiple ports for an aws ecs service]: https://medium.com/@faisalsuhail1/exposing-multiple-ports-for-an-aws-ecs-service-64b9821c09e8
|
||||
[guide to using amazon ebs with amazon ecs and aws fargate]: https://stackpioneers.com/2024/01/12/guide-to-using-amazon-ebs-with-amazon-ecs-and-aws-fargate/
|
||||
|
||||
60
knowledge base/cloud computing/aws/efs.md
Normal file
60
knowledge base/cloud computing/aws/efs.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Elastic File System
|
||||
|
||||
Serverless file storage for sharing files without the need for provisioning or managing storage capacity and
|
||||
performance.
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
|
||||
## TL;DR
|
||||
|
||||
Built to scale on demand growing and shrinking automatically as files are added and removed.<br/>
|
||||
Accessible across most types of AWS compute instances, including EC2, ECS, EKS, Lambda, and Fargate.
|
||||
|
||||
Supports the NFS v4.0 and v4.1 protocols.
|
||||
|
||||
Available file system types:
|
||||
|
||||
- _Regional_: redundant across **multiple** geographically separated AZs **within the same Region**.
|
||||
- _One Zone_: data stored within a **single AZ**, with all the limits it implies.
|
||||
|
||||
Default modes:
|
||||
|
||||
- _General Purpose Performance_: ideal for latency-sensitive applications.<br/>
|
||||
Examples: web-serving environments, content-management systems, home directories, and general file serving.
|
||||
- _Elastic Throughput_: designed to scale throughput performance automatically to meet the needs of workloads' activity.
|
||||
|
||||
Provides file-system-access semantics like strong data consistency and file locking.<br/>
|
||||
Supports controlling access to file systems through POSIX permissions.<br/>
|
||||
Supports authentication, authorization, and encryption.
|
||||
|
||||
EFS supports encryption in transit and encryption at rest.<br/>
|
||||
Encryption at rest is enabled when creating a file system. In such case, all data and metadata is encrypted.<br/>
|
||||
Encryption in transit is enabled when mounting a file system. Client access via NFS to EFS is controlled by both IAM
|
||||
policies and network security policies (i.e. security groups).
|
||||
|
||||
Windows-based EC2 instances are **not** supported.
|
||||
|
||||
## Further readings
|
||||
|
||||
- [Amazon Web Services]
|
||||
|
||||
### Sources
|
||||
|
||||
- [What is Amazon Elastic File System?]
|
||||
|
||||
<!--
|
||||
Reference
|
||||
═╬═Time══
|
||||
-->
|
||||
|
||||
<!-- In-article sections -->
|
||||
<!-- Knowledge base -->
|
||||
[amazon web services]: README.md
|
||||
|
||||
<!-- Files -->
|
||||
<!-- Upstream -->
|
||||
[what is amazon elastic file system?]: https://docs.aws.amazon.com/efs/latest/ug/whatisefs.html
|
||||
|
||||
<!-- Others -->
|
||||
@@ -24,6 +24,8 @@
|
||||
|
||||
Using `-H 'PRIVATE-TOKEN: glpat-m-…'` in API calls is the same as using `-H 'Authorization: bearer glpat-m-…'`.
|
||||
|
||||
Use _deploy tokens_ instead of personal access tokens to access repositories in pipelines as they do not expire.
|
||||
|
||||
```sh
|
||||
# List the current application settings of the GitLab instance.
|
||||
curl -H 'PRIVATE-TOKEN: glpat-m-…' 'https://gitlab.fqdn/api/v4/application/settings'
|
||||
|
||||
@@ -14,6 +14,12 @@ very cost-effective and easy to operate.
|
||||
1. [Compactor](#compactor)
|
||||
1. [Ruler](#ruler)
|
||||
1. [Clients](#clients)
|
||||
1. [Labels](#labels)
|
||||
1. [Labelling best practices](#labelling-best-practices)
|
||||
1. [Deployment](#deployment)
|
||||
1. [Monolithic mode](#monolithic-mode)
|
||||
1. [Simple scalable mode](#simple-scalable-mode)
|
||||
1. [Microservices mode](#microservices-mode)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
|
||||
@@ -23,10 +29,10 @@ It indexes **a set of labels** for each log stream instead of the full logs' con
|
||||
The log data itself is then compressed and stored in chunks in object storage solutions, or locally on the host's
|
||||
filesystem.
|
||||
|
||||
Loki can be executed in _single binary_ mode, with all its components running simultaneously in one process, or in
|
||||
Can be executed in _single binary_ mode, with all its components running simultaneously in one process, or in
|
||||
_simple scalable deployment_ mode, which groups components into read, write, and backend parts.
|
||||
|
||||
Loki's files can be _index_es or _chunk_s.<br/>
|
||||
Files can be _index_es or _chunk_s.<br/>
|
||||
Indexes are tables of contents in TSDB format of where to find logs for specific sets of labels.<br/>
|
||||
Chunks are containers for log entries for specific sets of labels.
|
||||
|
||||
@@ -45,6 +51,10 @@ zypper install 'loki'
|
||||
docker run --name loki -d \
|
||||
-p '3100:3100' -v "$(pwd)/config/loki.yml:/etc/loki/config.yml:ro" \
|
||||
'grafana/loki:3.3.2' -config.file='/etc/loki/config.yml'
|
||||
|
||||
# Run on Kubernetes in microservices mode.
|
||||
helm --namespace 'loki' upgrade --create-namespace --install --cleanup-on-fail 'loki' \
|
||||
--repo 'https://grafana.github.io/helm-charts' 'loki-distributed'
|
||||
```
|
||||
|
||||
Default configuration file for package-based installations is `/etc/loki/config.yml` or `/etc/loki/loki.yaml`.
|
||||
@@ -65,9 +75,26 @@ Default configuration file for package-based installations is `/etc/loki/config.
|
||||
<summary>Usage</summary>
|
||||
|
||||
```sh
|
||||
# Check the server is working.
|
||||
# Verify configuration files
|
||||
loki -verify-config
|
||||
loki -config.file='/etc/loki/local-config.yaml' -verify-config
|
||||
|
||||
# List available component targets
|
||||
loki -list-targets
|
||||
docker run 'docker.io/grafana/loki' -config.file='/etc/loki/local-config.yaml' -list-targets
|
||||
|
||||
# Start server components
|
||||
loki
|
||||
loki -target='all'
|
||||
loki -config.file='/etc/loki/config.yaml' -target='read'
|
||||
|
||||
# Print the final configuration to stderr and start
|
||||
loki -print-config-stderr …
|
||||
|
||||
# Check the server is working
|
||||
curl 'http://loki.fqdn:3100/ready'
|
||||
curl 'http://loki.fqdn:3100/metrics'
|
||||
curl 'http://loki.fqdn:3100/services'
|
||||
|
||||
# Check components in Loki clusters are up and running.
|
||||
# Such components must run by themselves for this.
|
||||
@@ -220,6 +247,155 @@ Multiple rulers will use a consistent hash ring to distribute rule groups amongs
|
||||
|
||||
Refer [Send log data to Loki].
|
||||
|
||||
## Labels
|
||||
|
||||
The content of each log line is **not** indexed. Instead, log entries are grouped into streams.<br/>
|
||||
The streams are then indexed with labels.
|
||||
|
||||
Labels are key-value pairs, e.g.:
|
||||
|
||||
```plaintext
|
||||
deployment_environment = development
|
||||
cloud_region = us-west-1
|
||||
namespace = grafana-server
|
||||
```
|
||||
|
||||
Sets of log messages that share all the labels above would be called a _log stream_.
|
||||
|
||||
Loki has a default limit of 15 index labels.
|
||||
|
||||
When Loki performs searches, it:
|
||||
|
||||
1. Looks for **all** messages in the chosen stream.
|
||||
1. Iterates through the logs in the stream to perform the query.
|
||||
|
||||
Labelling affects queries, which in turn affect dashboards.
|
||||
|
||||
Loki does **not** parse **nor** process log messages on ingestion.<br/>
|
||||
However, some labels may automatically be applied to logs by the client that collected them.
|
||||
|
||||
Loki automatically tries to populate a default `service_name` label while ingesting logs.<br/>
|
||||
This label is mainly used to find and explore logs in the `Explore Logs` feature of Grafana.
|
||||
|
||||
When receiving data from Grafana Alloy or the OpenTelemetry Collector as client, Loki automatically assigns some of the
|
||||
OTel resource attributes as labels.<br/>
|
||||
By default, some resource attributes will be stored as labels, with periods (.) replaced with underscores (_). The
|
||||
remaining attributes are stored as structured metadata with each log entry.
|
||||
|
||||
_Cardinality_ is the combination of unique labels and values (how many values can each label have). It impacts the
|
||||
number of log streams one creates and can lead to significant performance degradation.<br/>
|
||||
Prefer fewer labels with bounded values.
|
||||
|
||||
Loki performs very poorly when labels have high cardinality, as it is forced to build a huge index and flush thousands
|
||||
of tiny chunks to the object store.
|
||||
|
||||
Loki places the same restrictions on label naming as Prometheus:
|
||||
|
||||
- They _may_ contain ASCII letters and digits, as well as underscores and colons.<br/>
|
||||
It must match the `[a-zA-Z_:][a-zA-Z0-9_:]*` regex.
|
||||
- Unsupported characters shall be converted to an underscore.<br/>
|
||||
E.g.: `app.kubernetes.io/name` shall be written as `app_kubernetes_io_name`.
|
||||
- Do **not** begin **nor** end your label names with double underscores.<br/>
|
||||
This naming convention is used for internal labels, e.g. `_stream_shard_`.<br/>Internal labels are **hidden** by
|
||||
default in the label browser, query builder, and autocomplete to avoid creating confusion for users.
|
||||
|
||||
Prefer **not** adding labels based on the content of the log message.
|
||||
|
||||
Loki supports ingesting out-of-order log entries.<br/>
|
||||
Out-of-order writes are enabled globally by default and can be disabled/enabled on a cluster or per-tenant basis.
|
||||
|
||||
Entries in a given log stream (identified by a given set of label names and values) **must be ingested in order**
|
||||
within the default two hour time window.<br/>
|
||||
When trying to send entries that are too old for a given log stream, Loki will respond with the `too far behind` error.
|
||||
|
||||
Use labels to separate streams so they can be ingested separately:
|
||||
|
||||
- When planning to ingest out-of-order log entries.
|
||||
- For systems with different ingestion delays and shipping.
|
||||
|
||||
### Labelling best practices
|
||||
|
||||
- Use labels for things like regions, clusters, servers, applications, namespaces, and environments.
|
||||
|
||||
<details>
|
||||
|
||||
They will be fixed for given systems/apps and have bounded values.<br/>
|
||||
Static labels like these make it easier to query logs in a logical sense.
|
||||
|
||||
</details>
|
||||
|
||||
- Avoid adding labels for something until you know you need it.<br/>
|
||||
Prefer using filter expressions like `|= "text"` or `|~ "regex"` to brute force logs instead.
|
||||
- Ensure labels have low cardinality. Ideally, limit it to tens of values.
|
||||
- Prefer using labels with long-lived values.
|
||||
- Consider extracting often parsed labels from log lines on the client side by attaching it as structured metadata.
|
||||
- Be aware of dynamic labels applied by clients.
|
||||
|
||||
## Deployment
|
||||
|
||||
### Monolithic mode
|
||||
|
||||
Runs all of Loki's microservice components inside a single process as a single binary or Docker image.
|
||||
|
||||
Set the `-target` command line parameter to `all`.
|
||||
|
||||
Useful for experimentation, or for small read/write volumes of up to approximately 20GB per day.<br/>
|
||||
Recommended to use the [Simple scalable mode] if in need to scale the deployment further.
|
||||
|
||||
<details>
|
||||
<summary>Horizontally scale this mode to more instances</summary>
|
||||
|
||||
- Use a shared object store.
|
||||
- Configure the `ring` section of the configuration file to share state between all instances.
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Configure high availability</summary>
|
||||
|
||||
- Run multiple instances setting up the `memberlist_config` configuration.
|
||||
- Configure a shared object store
|
||||
- Configure the `replication_factor` to `3` or more.
|
||||
|
||||
This will route traffic to all the Loki instances in a round robin fashion.
|
||||
|
||||
</details>
|
||||
|
||||
Query parallelization is limited by the number of instances.<br/>
|
||||
Configure the `max_query_parallelism` setting in the configuration file.
|
||||
|
||||
### Simple scalable mode
|
||||
|
||||
Default configuration installed by Loki's Helm Chart and the easiest way to deploy Loki at scale.
|
||||
|
||||
Requires a reverse proxy to be deployed in front of Loki to direct client's API requests to either the read or write
|
||||
nodes. The Loki Helm chart deploys a default reverse proxy configuration using [Nginx].
|
||||
|
||||
This mode can scale up to a few TBs of logs per day.<br/>
|
||||
If going over this, recommended to use the [Microservices mode].
|
||||
|
||||
Separates execution paths into `read`, `write`, and `backend` targets.<br/>
|
||||
Targets can be scaled independently.
|
||||
|
||||
Execution paths are activated by defining the target on Loki's startup:
|
||||
|
||||
- `-target=write`: the `write` target is **stateful** and controlled by a Kubernetes StatefulSet.<br/>
|
||||
Contains the [distributor] and [ingester] components.
|
||||
- `-target=read`: the `read` target is **stateless** and _can_ be run as a Kubernetes Deployment.<br/>
|
||||
In the official helm chart this is currently deployed as a StatefulSet.<br/>
|
||||
Contains the [query frontend] and [querier] components.
|
||||
- `-target=backend`: the `backend` target is **stateful** and controlled by a Kubernetes StatefulSet.<br/>
|
||||
Contains the [compactor], [index gateway], [query scheduler] and [ruler] components.
|
||||
|
||||
### Microservices mode
|
||||
|
||||
Runs each Loki component as its own distinct processes.<br/>
|
||||
Each process is invoked specifying its own target.
|
||||
|
||||
Designed for Kubernetes deployments and available as the [loki-distributed] community-supported Helm chart.
|
||||
|
||||
Only recommended for very large Loki clusters, or when needing more precise control over them.
|
||||
|
||||
## Further readings
|
||||
|
||||
- [Website]
|
||||
@@ -232,6 +408,9 @@ Refer [Send log data to Loki].
|
||||
|
||||
- [Documentation]
|
||||
- [HTTP API reference]
|
||||
- [How to Set Up Grafana, Loki, and Prometheus Locally with Docker Compose: Part 1 of 3]
|
||||
- [Deploying Grafana, Loki, and Prometheus on AWS ECS with EFS and Cloud Formation (Part 3 of 3)]
|
||||
- [Storage - AWS deployment (S3 Single Store)]
|
||||
|
||||
<!--
|
||||
Reference
|
||||
@@ -239,8 +418,20 @@ Refer [Send log data to Loki].
|
||||
-->
|
||||
|
||||
<!-- In-article sections -->
|
||||
[compactor]: #compactor
|
||||
[distributor]: #distributor
|
||||
[index gateway]: #index-gateway
|
||||
[ingester]: #ingester
|
||||
[microservices mode]: #microservices-mode
|
||||
[querier]: #querier
|
||||
[query frontend]: #query-frontend
|
||||
[query scheduler]: #query-scheduler
|
||||
[ruler]: #ruler
|
||||
[simple scalable mode]: #simple-scalable-mode
|
||||
|
||||
<!-- Knowledge base -->
|
||||
[grafana]: grafana.md
|
||||
[nginx]: nginx.md
|
||||
[promtail]: promtail.md
|
||||
|
||||
<!-- Files -->
|
||||
@@ -248,7 +439,11 @@ Refer [Send log data to Loki].
|
||||
[codebase]: https://github.com/grafana/loki
|
||||
[documentation]: https://grafana.com/docs/loki/latest/
|
||||
[http api reference]: https://grafana.com/docs/loki/latest/reference/loki-http-api/
|
||||
[loki-distributed]: https://github.com/grafana/helm-charts/tree/main/charts/loki-distributed
|
||||
[send log data to loki]: https://grafana.com/docs/loki/latest/send-data/
|
||||
[storage - aws deployment (s3 single store)]: https://grafana.com/docs/loki/latest/configure/storage/#aws-deployment-s3-single-store
|
||||
[website]: https://grafana.com/oss/loki/
|
||||
|
||||
<!-- Others -->
|
||||
[deploying grafana, loki, and prometheus on aws ecs with efs and cloud formation (part 3 of 3)]: https://medium.com/@ahmadbilalch891/deploying-grafana-loki-and-prometheus-on-aws-ecs-with-efs-and-cloud-formation-part-3-of-3-24140ea8ccfb
|
||||
[how to set up grafana, loki, and prometheus locally with docker compose: part 1 of 3]: https://medium.com/@ahmadbilalch891/how-to-set-up-grafana-loki-and-prometheus-locally-with-docker-compose-part-1-of-3-62fb25e51d92
|
||||
|
||||
@@ -25,4 +25,5 @@ AWS_PROFILE='engineer' aws sts get-caller-identity
|
||||
|
||||
|
||||
# Run as Docker container
|
||||
docker run --rm -ti 'amazon/aws-cli' --version
|
||||
docker run --rm -ti -v "$HOME/.aws:/root/.aws:ro" 'amazon/aws-cli:2.17.16' autoscaling describe-auto-scaling-groups
|
||||
|
||||
@@ -4,6 +4,20 @@ alias aws-caller-info 'aws sts get-caller-identity'
|
||||
alias aws-ssm 'aws ssm start-session --target'
|
||||
alias aws-whoami 'aws-caller-info'
|
||||
|
||||
|
||||
function aws-alb-privateDnsName-from-name
|
||||
aws ec2 describe-network-interfaces --output 'text' \
|
||||
--query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateDnsName' \
|
||||
--filters Name='description',Values="ELB app/$argv[1]/*"
|
||||
end
|
||||
|
||||
function aws-alb-privateIps-from-name
|
||||
aws ec2 describe-network-interfaces --output 'text' \
|
||||
--query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' \
|
||||
--filters Name='description',Values="ELB app/$argv[1]/*"
|
||||
end
|
||||
|
||||
|
||||
function aws-assume-role-by-name
|
||||
set current_caller (aws-caller-info --output json | jq -r '.UserId' -)
|
||||
aws-iam-role-arn-from-name "$argv[1]" \
|
||||
@@ -20,7 +34,31 @@ function aws-ec2-instanceId-from-nameTag
|
||||
--query 'Reservations[].Instances[0].InstanceId'
|
||||
end
|
||||
|
||||
function aws-iam-role-arn-from-name
|
||||
function aws-ec2-nameTag-from-instanceId
|
||||
aws ec2 describe-instances --output 'text' \
|
||||
--filters "Name=instance-id,Values=$argv[1]" \
|
||||
--query "Reservations[].Instances[0].Tags[?(@.Key=='Name')].Value"
|
||||
end
|
||||
|
||||
function aws-ec2-tag-from-instanceId
|
||||
aws ec2 describe-instances --output 'text' \
|
||||
--filters "Name=instance-id,Values=$argv[1]" \
|
||||
--query "Reservations[].Instances[0].Tags[?(@.Key=='$argv[2]')].Value"
|
||||
end
|
||||
|
||||
function aws-ec2-tags-from-instanceId
|
||||
aws ec2 describe-instances --output 'table' \
|
||||
--filters "Name=instance-id,Values=$argv[1]" \
|
||||
--query 'Reservations[].Instances[0].Tags[]'
|
||||
end
|
||||
|
||||
function aws-ecs-tasks-from-clusterName-and-serviceName
|
||||
aws ecs list-tasks --cluster "$argv[1]" --output 'text' --query 'taskArns' \
|
||||
| xargs aws ecs describe-tasks --cluster "$argv[1]" \
|
||||
--query "tasks[?group.contains(@, '$argv[2]')]" --tasks
|
||||
end
|
||||
|
||||
function aws-iam-roleArn-from-name
|
||||
aws iam list-roles --output 'text' \
|
||||
--query "Roles[?RoleName == '$argv[1]'].Arn"
|
||||
end
|
||||
@@ -35,12 +73,6 @@ function aws-iam-user-owning-accessKey
|
||||
| jq -rs 'flatten|first'
|
||||
end
|
||||
|
||||
aws iam list-users --no-cli-pager --query 'Users[].UserName' --output 'text' | xargs -n '1' | shuf \
|
||||
| xargs -n 1 -P (nproc) aws iam list-access-keys --output 'json' \
|
||||
--query "AccessKeyMetadata[?AccessKeyId=='$argv[1]'].UserName" --user \
|
||||
| jq -rs 'flatten|first'
|
||||
end
|
||||
|
||||
|
||||
alias aws-ec2-running-instanceIds "aws ec2 describe-instances --output 'text' \
|
||||
--filters 'Name=instance-state-name,Values=running' \
|
||||
|
||||
@@ -89,6 +89,10 @@ aws ecs list-tasks --query 'taskArns' --output 'text' --cluster 'testCluster' --
|
||||
| tee \
|
||||
| xargs -I{} curl -fs "http://{}:8080"
|
||||
|
||||
# Describe tasks given a service name
|
||||
aws ecs list-tasks --cluster 'testCluster' --output 'text' --query 'taskArns' \
|
||||
| xargs aws ecs describe-tasks --cluster 'testCluster' --query "tasks[?group.contains(@, 'serviceName')]" --output 'yaml' --tasks
|
||||
|
||||
# Show information about services
|
||||
aws ecs describe-services --cluster 'stg' --services 'grafana'
|
||||
|
||||
@@ -186,6 +190,15 @@ aws iam list-users --no-cli-pager --query 'Users[].UserName' --output 'text' \
|
||||
aws iam --no-cli-pager list-access-keys
|
||||
aws iam --no-cli-pager list-access-keys --user-name 'mark'
|
||||
|
||||
# Change users' console password
|
||||
# FIXME: check
|
||||
aws iam update-login-profile --user-name 'logan'
|
||||
aws iam update-login-profile --user-name 'mike' --password 'newPassword' --password-reset-require
|
||||
|
||||
# Change one's own console password
|
||||
# FIXME: check
|
||||
basename (aws sts get-caller-identity --query 'Arn' --output 'text') \
|
||||
| xargs aws iam update-login-profile --user-name
|
||||
|
||||
###
|
||||
# Image Builder
|
||||
|
||||
@@ -24,3 +24,6 @@ docker save 'local/image:latest' | ssh -C 'user@remote.host' docker load
|
||||
docker inspect 'ghcr.io/jqlang/jq:latest' # image
|
||||
docker inspect 'host' # network
|
||||
docker inspect 'prometheus-1' # container
|
||||
|
||||
# Install compose directly from package
|
||||
dnf install 'https://download.docker.com/linux/fedora/41/aarch64/stable/Packages/docker-compose-plugin-2.32.1-1.fc41.aarch64.rpm'
|
||||
|
||||
@@ -47,6 +47,7 @@ git reset --soft HEAD~1 # or `git reset --soft HEAD^`
|
||||
git restore --staged '.lefthook-local.yml' # or `git reset HEAD '.lefthook-local.yml'`
|
||||
git commit -c ORIG_HEAD
|
||||
|
||||
|
||||
##
|
||||
# Change the default branch from 'master' to 'main'.
|
||||
# --------------------------------------
|
||||
@@ -76,6 +77,12 @@ git format-patch HEAD~1 --stdout
|
||||
# create patches from specific commits
|
||||
git format-patch -1 '3918a1d036e74d47a5c830e4bbabba6f507162b1'
|
||||
|
||||
|
||||
###
|
||||
# Take actions on multiple repositories
|
||||
# --------------------------------------
|
||||
###
|
||||
|
||||
git-all () {
|
||||
[[ -n $DEBUG ]] && set -o xtrace
|
||||
|
||||
@@ -103,7 +110,12 @@ git-all () {
|
||||
[[ -n $DEBUG ]] && set +o xtrace
|
||||
}
|
||||
|
||||
# Reset fork to upstream's state
|
||||
|
||||
###
|
||||
# Reset forks to their upstream's state
|
||||
# --------------------------------------
|
||||
###
|
||||
|
||||
git remote add 'upstream' '/url/to/original/repo'
|
||||
git fetch 'upstream'
|
||||
git checkout 'master'
|
||||
|
||||
@@ -1,5 +1,35 @@
|
||||
#!/usr/bin/env sh
|
||||
|
||||
# Verify configuration files
|
||||
loki -verify-config
|
||||
loki -config.file='/etc/loki/local-config.yaml' -verify-config
|
||||
|
||||
# List available component targets
|
||||
loki -list-targets
|
||||
docker run 'docker.io/grafana/loki' -config.file='/etc/loki/local-config.yaml' -list-targets
|
||||
|
||||
# Start server components
|
||||
loki
|
||||
loki -target='all'
|
||||
loki -config.file='/etc/loki/config.yaml' -target='read'
|
||||
|
||||
# Run on EKS in microservices mode
|
||||
helm repo add 'grafana' 'https://grafana.github.io/helm-charts' --force-update
|
||||
helm search repo --versions 'grafana/loki-distributed'
|
||||
docker pull '012345678901.dkr.ecr.eu-west-1.amazonaws.com/grafana/loki:2.9.10'
|
||||
helm --namespace 'loki' diff upgrade --install 'loki' \
|
||||
--repo 'https://grafana.github.io/helm-charts' 'loki-distributed' --version '0.80.0' \
|
||||
--values 'values.yml' --set 'loki.image.registry'='012345678901.dkr.ecr.eu-west-1.amazonaws.com'
|
||||
helm --namespace 'loki' upgrade --create-namespace --install --cleanup-on-fail 'loki' \
|
||||
--repo 'https://grafana.github.io/helm-charts' 'loki-distributed' --version '0.80.0' \
|
||||
--values 'values.yml' --set 'loki.image.registry'='012345678901.dkr.ecr.eu-west-1.amazonaws.com' \
|
||||
--set 'loki.storageConfig.aws.s3'='s3://eu-west-1' --set 'loki.storageConfig.aws.bucketnames'='loki-data' \
|
||||
--set 'loki.storageConfig.boltdb_shipper.shared_store'='s3'
|
||||
|
||||
# Print the final configuration to stderr and start
|
||||
loki -print-config-stderr …
|
||||
|
||||
# Check the server is working
|
||||
curl 'http://loki.fqdn:3100/ready'
|
||||
curl 'http://loki.fqdn:3100/metrics'
|
||||
curl 'http://loki.fqdn:3100/services'
|
||||
|
||||
@@ -24,3 +24,8 @@ curl 'http://promtail.fqdn:9080/metrics'
|
||||
|
||||
# Connect to the web server
|
||||
open 'http://promtail.fqdn:9080/'
|
||||
open 'http://promtail.fqdn:9080/targets'
|
||||
open 'http://promtail.fqdn:9080/service-discovery'
|
||||
|
||||
# Inspect pipeline's stages
|
||||
cat 'file.log' | promtail --stdin --dry-run --inspect --client.url 'http://loki.fqdn:3100/loki/api/v1/push'
|
||||
|
||||
225
snippets/pulumi/aws/run loki in monolithic mode on ecs.ts
Normal file
225
snippets/pulumi/aws/run loki in monolithic mode on ecs.ts
Normal file
@@ -0,0 +1,225 @@
|
||||
import * as aws from "@pulumi/aws";
|
||||
|
||||
const vpc_output = aws.ec2.getVpcOutput({
|
||||
filters: [{
|
||||
name: "tag:Name",
|
||||
values: "Default",
|
||||
}],
|
||||
});
|
||||
|
||||
const dnsZone_output = aws.route53.getZoneOutput({ name: "example.org." });
|
||||
|
||||
const ecsCluster_output = aws.ecs.getClusterOutput({ clusterName: "someCluster" });
|
||||
|
||||
const securityGroup = new aws.ec2.SecurityGroup(
|
||||
"loki",
|
||||
{
|
||||
vpcId: vpc_output.apply((vpc: aws.ec2.Vpc) => vpc.id),
|
||||
name: "Loki",
|
||||
description: "Manage access to and from Loki",
|
||||
tags: {
|
||||
Name: "Loki",
|
||||
Application: "Loki",
|
||||
},
|
||||
|
||||
ingress: [
|
||||
{
|
||||
description: "HTTP server",
|
||||
cidrBlocks: [ "0.0.0.0/0" ],
|
||||
ipv6CidrBlocks: [ "::/0" ],
|
||||
protocol: "tcp",
|
||||
fromPort: 3100,
|
||||
toPort: 3100,
|
||||
},
|
||||
{
|
||||
description: "gRPC server",
|
||||
cidrBlocks: [ "0.0.0.0/0" ],
|
||||
ipv6CidrBlocks: [ "::/0" ],
|
||||
protocol: "tcp",
|
||||
fromPort: 9095,
|
||||
toPort: 9095,
|
||||
},
|
||||
],
|
||||
egress: [{
|
||||
description: "Allow all",
|
||||
cidrBlocks: [ "0.0.0.0/0" ],
|
||||
ipv6CidrBlocks: [ "::/0" ],
|
||||
protocol: "-1",
|
||||
fromPort: 0,
|
||||
toPort: 0,
|
||||
}],
|
||||
},
|
||||
);
|
||||
|
||||
const taskDefinition = new aws.ecs.TaskDefinition(
|
||||
"loki",
|
||||
{
|
||||
family: "Loki",
|
||||
tags: { Application: "Loki" },
|
||||
|
||||
networkMode: "awsvpc",
|
||||
requiresCompatibilities: [ "FARGATE" ],
|
||||
cpu: "256", // Fargate requirement
|
||||
memory: "512", // Fargate requirement
|
||||
executionRoleArn: "arn:aws:iam::012345678901:role/ecsTaskExecutionRole", // logging requirement
|
||||
containerDefinitions: JSON.stringify([{
|
||||
name: "loki",
|
||||
image: "grafana/loki:3.3.2",
|
||||
essential: true,
|
||||
environment: [], // specified to avoid showing changes on every run
|
||||
volumesFrom: [], // specified to avoid showing changes on every run
|
||||
mountPoints: [], // specified to avoid showing changes on every run
|
||||
systemControls: [], // specified to avoid showing changes on every run
|
||||
healthCheck: {
|
||||
command: [
|
||||
"CMD-SHELL",
|
||||
"wget -qO- localhost:3100/ready || exit 1",
|
||||
],
|
||||
startPeriod: 15,
|
||||
},
|
||||
portMappings: [
|
||||
{
|
||||
name: "http-server",
|
||||
appProtocol: "http",
|
||||
containerPort: 3100,
|
||||
},
|
||||
{
|
||||
name: "grpc-server",
|
||||
appProtocol: "grpc",
|
||||
containerPort: 9095,
|
||||
},
|
||||
],
|
||||
logConfiguration: {
|
||||
logDriver: "awslogs",
|
||||
options: {
|
||||
"awslogs-create-group": "true",
|
||||
"awslogs-group": "/ecs/loki",
|
||||
"awslogs-region": "eu-west-1",
|
||||
"awslogs-stream-prefix": "ecs",
|
||||
},
|
||||
},
|
||||
}]),
|
||||
},
|
||||
);
|
||||
|
||||
const privateSubnets_output = aws.ec2.getSubnetOutput({
|
||||
filters: [{
|
||||
name: "tag:Type",
|
||||
values: [ "Private" ],
|
||||
}],
|
||||
});
|
||||
const targetGroup_http = new aws.alb.TargetGroup(
|
||||
"loki-http",
|
||||
{
|
||||
vpcId: vpc_output.apply((vpc: aws.ec2.Vpc) => vpc.id),
|
||||
name: "loki-http",
|
||||
tags: { Application: "Loki" },
|
||||
|
||||
targetType: "ip",
|
||||
ipAddressType: "ipv4",
|
||||
protocol: "HTTP",
|
||||
port: 3100,
|
||||
healthCheck: {
|
||||
path: "/ready",
|
||||
},
|
||||
},
|
||||
);
|
||||
const targetGroup_grpc = new aws.alb.TargetGroup(
|
||||
"loki-grpc",
|
||||
{
|
||||
vpcId: vpc_output.apply((vpc: aws.ec2.Vpc) => vpc.id),
|
||||
name: "loki-grpc",
|
||||
tags: { Application: "Loki" },
|
||||
|
||||
targetType: "ip",
|
||||
ipAddressType: "ipv4",
|
||||
protocol: "HTTP",
|
||||
protocolVersion: "GRPC",
|
||||
port: 9095,
|
||||
},
|
||||
);
|
||||
const loadBalancer = new aws.alb.LoadBalancer(
|
||||
"loki",
|
||||
{
|
||||
name: "Loki",
|
||||
tags: { Application: "Loki" },
|
||||
|
||||
internal: true,
|
||||
ipAddressType: "ipv4",
|
||||
subnets: privateSubnets_output.apply((subnets: aws.ec2.Subnet[]) => subnets.map(subnet => subnet.id)),
|
||||
securityGroups: [ securityGroup.id ],
|
||||
accessLogs: {
|
||||
bucket: "",
|
||||
},
|
||||
},
|
||||
);
|
||||
new aws.route53.Record(
|
||||
"loki",
|
||||
{
|
||||
zoneId: dnsZone_output.apply((zone: aws.route53.Zone) => zone.zoneId),
|
||||
name: "loki.example.org",
|
||||
type: "A",
|
||||
aliases: [{
|
||||
name: loadBalancer.dnsName,
|
||||
zoneId: loadBalancer.zoneId,
|
||||
evaluateTargetHealth: true,
|
||||
}],
|
||||
},
|
||||
);
|
||||
new aws.alb.Listener(
|
||||
"loki-http",
|
||||
{
|
||||
tags: { Application: "Loki" },
|
||||
loadBalancerArn: loadBalancer.arn,
|
||||
port: 3100,
|
||||
protocol: "HTTP",
|
||||
defaultActions: [{
|
||||
order: 1,
|
||||
targetGroupArn: targetGroup_http.arn,
|
||||
type: "forward",
|
||||
}],
|
||||
},
|
||||
);
|
||||
// new aws.alb.Listener(
|
||||
// FIXME: Listener protocol 'HTTP' is not supported with a target group with the protocol-version 'GRPC'
|
||||
// "loki-grpc",
|
||||
// {
|
||||
// tags: { Application: "Loki" },
|
||||
// loadBalancerArn: loadBalancer.arn,
|
||||
// port: 9095,
|
||||
// protocol: "HTTP",
|
||||
// defaultActions: [{
|
||||
// order: 1,
|
||||
// targetGroupArn: targetGroup_grpc.arn,
|
||||
// type: "forward",
|
||||
// }],
|
||||
// },
|
||||
// );
|
||||
new aws.ecs.Service(
|
||||
"loki",
|
||||
{
|
||||
name: "Loki",
|
||||
tags: { Application: "Loki" },
|
||||
|
||||
cluster: ecsCluster_output.arn,
|
||||
taskDefinition: taskDefinition.arn,
|
||||
desiredCount: 1,
|
||||
launchType: "FARGATE",
|
||||
networkConfiguration: {
|
||||
subnets: privateSubnets_output.apply((subnets: aws.ec2.Subnet[]) => subnets.map(subnet => subnet.id)),
|
||||
securityGroups: [ securityGroup.id ],
|
||||
},
|
||||
loadBalancers: [
|
||||
{
|
||||
containerName: "loki",
|
||||
containerPort: 3100,
|
||||
targetGroupArn: targetGroup_http.arn,
|
||||
},
|
||||
// {
|
||||
// containerName: "loki",
|
||||
// containerPort: 9095,
|
||||
// targetGroupArn: targetGroup_grpc.arn,
|
||||
// },
|
||||
],
|
||||
},
|
||||
);
|
||||
Reference in New Issue
Block a user